Intelligent Support for Testing in

Languages for Informal Programmers

Margaret Burnett, Gregg Rothermel, and Curtis Cook
Oregon State University

© 1999 Margaret Burnett, Gregg Rothermel, and Curtis Cook

Recently, a number of new languages have been created featuring high degrees of concreteness and immediate visual feedback. The presence of concreteness and feedback have been motivated in part by a sort of "instant testing" goal, with the idea that if the user immediately sees the result of a program edit, he or she will spot programming bugs as quickly as they are made, and hence will be able to eradicate them right away. This motivation has been especially prevalent in end-user languages, but is also found in languages for professional programmers.

Spreadsheet systems, which may be the most widely-used type of "programming language," feature this high degree of concrete, sample values, and the automatic recalculation feature provides immediate visual feedback after every formula edit. Spreadsheet systems are used by a wide variety of users, ranging from end users using them for their own use (such as to calculate their studentsí grades), to professional programmers creating spreadsheet templates for sale (such as to calculate income taxes). Examples of people between those two extremes are the informal users who are of particular interest in this workshop; they include people creating spreadsheets for their own use and for other people to use, such as an office manager creating budget spreadsheets for the rest of the office staff to use.

Spreadsheet systems provide evidence that concreteness and immediate visual feedback have not alone led to very much success at finding the bugs or at removing them. There is a substantial body of research showing that spreadsheets often contain bugs. For example, field audits of real-world spreadsheets have found that 20-40% of these contain bugs, and that between 1% and 4% of all cells contain bugs [Teo and Tan 1997]. Also, in an early empirical study of experienced spreadsheet users, 44% of the spreadsheets created by those users were found to contain user-generated bugs [Brown and Gould 1987]. Results of several later studies have been similar: between 10% and 90% of the spreadsheets examined have been found to contain bugs. (See [Panko and Halverson 1996] for a survey of these studies). Compounding this problem, creators of spreadsheets seem to express unwarranted confidence in the reliability of their programs [Wilcox et al. 1997].

We have been working to bring at least some of the benefits of applying formalized notions of testing to the informal, incremental, development world of spreadsheets [Rothermel et al. 1998]. Our strategy was to start by developing a specific definition of what it means for a cell to be tested enough (called a "test adequacy criterion" by software engineering researchers). Using this notion to define the ideal, the system continuously communicates to users how closely they have gotten, through their testing activities, each cell to this ideal. To do this, our system treats each user "decision" about correctness to be a test, and the user communicates those decisions to the system by checking off a value whenever he or she notices that it is correct. The system tracks these tests and their implications, and also keeps track of what previous tests are undone as a result of formula edits. This approach provides feedback about testing adequacy at all stages of spreadsheet development, with the intent of helping users detect bugs in their spreadsheets. We have implemented a prototype of the approach, which is integrated with the research spreadsheet system, Forms/3 [Burnett and Gottfried 1998].

We have been working on this approach for the last couple of years, and have devised algorithms for the basic mechanisms that are compatible with the need for a high degree of responsiveness. We have also devised a visual representation of "testedness" that, in our opinion, does not require a formal background in programming or testing. We have completed some empirical work showing that the test adequacy criterion we are using does indeed reveal common bugs in spreadsheets. We are now in the middle of empirical work to learn whether, when programmers, end users, and other informal users are supported by the approach, they become more effective in their testing and in their debugging.

[Brown and Gould 1987] P. Brown and J. Gould, "Experimental Study of People Creating Spreadsheets," ACM Transactions on Office Information Systems 5, 1987, pages 258-272.

[Burnett and Gottfried 1998] M. Burnett and H. Gottfried, "Graphical Definitions: Expanding Spreadsheet Languages through Direct Manipulation and Gestures," ACM Transactions on Computer-Human Interaction 5(1), March 1998, pages 1-33.

[Panko and Halverson 1996] R. Panko and R. Halverson, "Spreadsheets on Trial: A Survey of Research on Spreadsheet Risks," Hawaii International Conference on System Sciences, Maui, Hawaii, Jan. 2-5, 1996.

[Rothermel et al. 1998] G. Rothermel, L. Li, C. DuPuis, and M. Burnett, "What You See is What You Test: A Methodology For Testing Form-Based Visual Programs," International Conference on Software Engineering, April, 1998, pages 198-207.

[Teo and Tan 1997] T. Teo and M. Tan, "Quantitative and Qualitative Errors in Spreadsheet Development," Hawaii International Conference on System Sciences, Jan. 1997, pages 149-155.

[Wilcox et al. 1997] Eric Wilcox, John Atwood, Margaret Burnett, J. J. Cadiz, and Curtis Cook, "Does Continuous Visual Feedback aid Debugging in Direct-Manipulation Programming Systems?" ACM Conference on Human Factors in Computing Systems (CHIí97), March 22-27, 1997, pages 258-265.