From lechner@cs.uml.edu Thu Nov 16 17:26:31 2006 From: Bob Lechner Subject: Re: I Object[thread] -interesting comments on program testing - FYInfo To: kbagley@us.ibm.com (Keith Bagley) Cc: all RJLRef: $PH/*/DataModelsTestsAlgs061116.txt I believe Kieth is completely right in his problem with the premise: ["... let's assume you have an appropriate set of reference tests that you're confident in,"] Obtaining confidence is easy for trivial (unneeded?) tests. It may be impossible for really complex (critical) ones. System develoment and evolution is so much broader than programming contests can cover. But regression testing is very useful during repetitive debug cycles when iteractions are not verifiable from static analysis and proof-checking. And test management can be a big burden if not automated. Of more interest (to me) is another person's comment that Algorithms courses are rarely O-O-based. I believe this is because algorithm research lacks my data-models first motivational opporftunity: New algorithms tend to invent NOVEL data structuring approaches that are customized to optimize performance. Later, those with broad applicability become candidates for standardization, as with Design Patterns. (Research usually precedes development.) Bob Lechner > From kbagley@us.ibm.com Thu Nov 16 15:57:58 2006 > From: Keith Bagley > > Interesting discussion. > The problem I have with the premise is that this focus on testing missing > the important point that testing is just one aspect of the entire Quality > Assurance and Software Engineering equation. Assuming you were able to > test and validate a program running correctly, you still have not > validated that the requirements for the system were correct, and manage > the traceability between the elements in your requirements space all the > way through development, testing, deployment and even operations. The > classic cost curve for some of this typically shows that an error found > and corrected in a requirements engineering iteration costs significantly > less than an error found and corrected during "test" (apply your own cost > basis, but it's somewhere around 1:100). > I think this points to the crux of the problem for many CS curriculums in > that they focus on producing programmers instead of software engineers who > understand the entire SDLC and the implications of using ad-hoc approaches > for the core disciplines (including testing, design, requirements, and > even project management). > > Keith > > > Bob Lechner > 11/16/2006 03:03 PM > > To > alison_lea@yahoo.com, Keith Bagley/Bedford/IBM@IBMUS, cguffey@gmail.com, > agabriel@cs.uml.edu, nitin.sonawane@verizon.net, nsonawan@cs.uml.edu > (Nitin Sonawane), nitin.sonawane@gmail.com, lechner@cs.uml.edu (Bob > Lechner) > cc > > Subject > Re: I Object[thread] -interesting comments on program testing - FYInfo > > > > Forwarded message: > > From owner-sigcse-members@ACM.ORG Thu Nov 16 13:33:39 2006 > > Sender: SIGCSE Member Forum > > From: Stephen Edwards > > Organization: Virginia Tech, CS Dept. > > Subject: Re: I Object - approach for winning ACM teams > > To: sigcse-members@ACM.ORG > > In-Reply-To: > > > > Alert: here's a long-winded reply to a tangent that > > Rich Lamb started, and which has nothing really to > > do with the objects-early/procedural debate. Skip > > it if you're not interested. > > > > Rich Lamb wrote: > > > I run a free monthly programming contest. I have always wondered how > > > one might objectively score a (correctly-running) program beyond the > > > criteria of time. Any ideas out there? > > > > It turns out that determining whether a program is a > > "correctly-running" program is fairly difficult in and of itself, > > particularly if the problem is written in English. But let's assume > > you have an appropriate set of reference tests that you're > > confident in, and you're using those to make a (binary?) > > determination about the correctness of a given program. > > > > Ian Utting has already suggested Cyclomatic complexity, and there are > > a host of other potential metrics to consider. Even size might > > be useful (i.e., number of tokens). Static analysis, from tools > > like FindBugs, PMD, or Checkstyle can also provide useful > > measures beyond traditional code metrics (of course, they are > > specific to Java; other languages might have different static > > analysis tools available). However, here's a different > > suggestion. > > > > When we grade programs in our courses, we require students to > > test them. We then grade them on how thoroughly they test their > > own code. We actually use code coverage tools to do this, and > > can use relatively weak or much stronger criteria (e.g., just method- > > level coverage, or just statement-level, or branch-level, or MC/DC, > > etc.) depending on the level at which we expect students to > > perform. Students don't have to understand those terms or > > concepts explicitly, since we give them a color-coded web printout > > of their code that highlights lines that were not tested well enough, > > together with an explanation of why. > > > > Anyway, this provides a way of measuring: > > > > (a) Does the program do what it's author intended (i.e., what > > proportion of the author's tests pass)? > > > > (b) Has the author thoroughly expressed his or her understanding > > of what the program is intended to do (i.e., what proportion > > of the author's code is actually covered by his/her tests, using > > an appropriate coverage metric)? > > > > (c) How much of the problem space is correctly addressed by > > the program (i.e., what proportion of the reference tests > > are passed)? > > > > If you're ambitious, you can even attempt to run the programmer's > > own tests against a reference solution, in order to spot places > > where the programmer has misunderstood (or failed to read) the > > problem description, or even places where the problem description > > itself was ambiguous or open to interpretation. > > > > If you do that, you can even use code coverage achieved on the > > reference implementation to rate the "completeness" of the > > programmer's own test cases (and to devise better, more comprehensive > > reference test suites, too). > > > > The result is a number of cross-supporting measures that give > > you a much better evaluation than a single yes/no answer on > > a canned set of reference tests. > > > > OK, for those of you who know how often I spout off about > > teaching students to test, I've just got to make one more > > suggestion: > > > > If you're clever, you could take a pile of the last contest's > > "correct" submissions for a problem, and form a completely > > different kind of contest: a software testing contest. You > > post the problem, and ask competitors to develop test suites > > for the problem. You gauge correctness by running them > > against your collection of known-good solutions, and use > > measures like the numbers of tests that fail incorrectly and > > code coverage (perhaps using multiple metrics, maybe > > even weighted) to score submissions. Perhaps you could > > even award bonus points for the number of hidden bugs > > found in the solutions you "thought" were correct (there's > > always a gotcha!). Just an idea ... > > > > -- Steve > > > > -- > > Stephen Edwards 604 McBryde Hall Dept. of Computer > Science > > e-mail : edwards@cs.vt.edu U.S. mail: Virginia Tech > (VPI&SU) > > office phone: (540)-231-5723 Blacksburg, VA > 24061 > > > ------------------------------------------------------------------------------- > > > > > > --=_alternative 007324FB85257228_= > Content-Type: text/html; charset="US-ASCII" > > >
Interesting discussion. >
The problem I have with the premise > is that this focus on testing missing the important point that testing > is just one aspect of the entire Quality Assurance and Software Engineering > equation. Assuming you were able to test and validate a program running > correctly, you still have not validated that the requirements for the system > were correct, and manage the traceability between the elements in your > requirements space all the way through development, testing, deployment > and even operations. The classic cost curve for some of this typically > shows that an error found and corrected in a requirements engineering iteration > costs significantly less than an error found and corrected during "test" > (apply your own cost basis, but it's somewhere around 1:100). >
I think this points to the crux of the > problem for many CS curriculums in that they focus on producing programmers > instead of software engineers who understand the entire SDLC and the implications > of using ad-hoc approaches for the core disciplines (including testing, > design, requirements, and even project management). >
>
Keith
>
>
>
>
> > >
Bob Lechner <lechner@cs.uml.edu> > >

11/16/2006 03:03 PM >

> > > > >
>
To
>
alison_lea@yahoo.com, Keith Bagley/Bedford/IBM@IBMUS, > cguffey@gmail.com, agabriel@cs.uml.edu, nitin.sonawane@verizon.net, nsonawan@cs.uml.edu > (Nitin Sonawane), nitin.sonawane@gmail.com, lechner@cs.uml.edu (Bob Lechner) >
>
cc
>
>
>
Subject
>
Re: I Object[thread] -interesting comments > on program testing - FYInfo
>
> > >
>
>
>
>
>
Forwarded message:
> > From owner-sigcse-members@ACM.ORG  Thu Nov 16 13:33:39 2006
> > Sender: SIGCSE Member Forum <sigcse-members@ACM.ORG>
> > From: Stephen Edwards <edwards@CS.VT.EDU>
> > Organization: Virginia Tech, CS Dept.
> > Subject: Re: I Object - approach for winning ACM teams
> > To: sigcse-members@ACM.ORG
> > In-Reply-To:  <s55ad00b.059@ceccluster_cecgwia_server.cranbrook.edu>
> >
> > Alert: here's a long-winded reply to a tangent that
> > Rich Lamb started, and which has nothing really to
> > do with the objects-early/procedural debate.  Skip
> > it if you're not interested.
> >
> > Rich Lamb wrote:
> > > I run a free monthly programming contest.  I have always > wondered how
> > > one might objectively score a (correctly-running) program beyond > the
> > > criteria of time.  Any ideas out there?
> >
> > It turns out that determining whether a program is a
> > "correctly-running" program is fairly difficult in and of > itself,
> > particularly if the problem is written in English.  But let's > assume
> > you have an appropriate set of reference tests that you're
> > confident in, and you're using those to make a (binary?)
> > determination about the correctness of a given program.
> >
> > Ian Utting has already suggested Cyclomatic complexity, and there > are
> > a host of other potential metrics to consider.  Even size might
> > be useful (i.e., number of tokens).  Static analysis, from tools
> > like FindBugs, PMD, or Checkstyle can also provide useful
> > measures beyond traditional code metrics (of course, they are
> > specific to Java; other languages might have different static
> > analysis tools available).  However, here's a different
> > suggestion.
> >
> > When we grade programs in our courses, we require students to
> > test them.  We then grade them on how thoroughly they test their
> > own code.  We actually use code coverage tools to do this, and
> > can use relatively weak or much stronger criteria (e.g., just method-
> > level coverage, or just statement-level, or branch-level, or MC/DC,
> > etc.) depending on the level at which we expect students to
> > perform.  Students don't have to understand those terms or
> > concepts explicitly, since we give them a color-coded web printout
> > of their code that highlights lines that were not tested well enough,
> > together with an explanation of why.
> >
> > Anyway, this provides a way of measuring:
> >
> > (a) Does the program do what it's author intended (i.e., what
> >      proportion of the author's tests pass)?
> >
> > (b) Has the author thoroughly expressed his or her understanding
> >      of what the program is intended to do (i.e., what > proportion
> >      of the author's code is actually covered by his/her > tests, using
> >      an appropriate coverage metric)?
> >
> > (c) How much of the problem space is correctly addressed by
> >      the program (i.e., what proportion of the reference > tests
> >      are passed)?
> >
> > If you're ambitious, you can even attempt to run the programmer's
> > own tests against a reference solution, in order to spot places
> > where the programmer has misunderstood (or failed to read) the
> > problem description, or even places where the problem description
> > itself was ambiguous or open to interpretation.
> >
> > If you do that, you can even use code coverage achieved on the
> > reference implementation to rate the "completeness" of the
> > programmer's own test cases (and to devise better, more comprehensive
> > reference test suites, too).
> >
> > The result is a number of cross-supporting measures that give
> > you a much better evaluation than a single yes/no answer on
> > a canned set of reference tests.
> >
> > OK, for those of you who know how often I spout off about
> > teaching students to test, I've just got to make one more
> > suggestion:
> >
> > If you're clever, you could take a pile of the last contest's
> > "correct" submissions for a problem, and form a completely
> > different kind of contest: a software testing contest.  You
> > post the problem, and ask competitors to develop test suites
> > for the problem.  You gauge correctness by running them
> > against your collection of known-good solutions, and use
> > measures like the numbers of tests that fail incorrectly and
> > code coverage (perhaps using multiple metrics, maybe
> > even weighted) to score submissions.  Perhaps you could
> > even award bonus points for the number of hidden bugs
> > found in the solutions you "thought" were correct (there's
> > always a gotcha!).  Just an idea ...
> >
> >                     >              -- Steve
> >
> > --
> > Stephen Edwards            604 McBryde > Hall          Dept. of Computer Science
> > e-mail      : edwards@cs.vt.edu       >     U.S. mail: Virginia Tech (VPI&SU)
> > office phone: (540)-231-5723             >             Blacksburg, VA  24061
> > -------------------------------------------------------------------------------
> >
>
>
>
> --=_alternative 007324FB85257228_=-- >