91.522 GENCPP Team—Mark Ottesen and Katherine Miu

Enhancements to GENCPP

 

Replacing Intrusive Data Structures with Non-Intrusive Data Structures

 

December 16, 2000; revised 2005/09/11, 2006/12/10[1]

 

Table of Contents

 

Background............................................................................................................................... 2

Goals......................................................................................................................................... 2

Compilability........................................................................................................................ 2

Portability.............................................................................................................................. 2

Encapsulation........................................................................................................................ 2

Information Hiding............................................................................................................... 3

Compatibility........................................................................................................................ 3

Intrusive vs. Non-intrusive Data Structures............................................................................. 4

STL Lists vs. Vectors............................................................................................................ 7

Process...................................................................................................................................... 8

Bug fixes – Port to Win32/Visual C++................................................................................. 8

Refactoring database output files....................................................................................... 10

Container operations........................................................................................................... 18

Changes to macros and the API.......................................................................................... 23

Testing.................................................................................................................................... 26

Future Work............................................................................................................................ 33

Appendix A - Porting Changes and Bug Fixes to gencpp...................................................... 36

Appendix B - Source and documentation locations............................................................... 46

Appendix C - chgen to gencpp API Refactoring Hints.......................................................... 47

Bibliography........................................................................................................................... 51

 


Background

 

Gencpp stands for C++ generator.  It is an application which came into being as an extension of an older application, chgen.  The purpose of gencpp is to generate C++ code to provide a memory-resident network database based on a relational schema provided by application developers and to support a C++ language API to that database.  The remainder of this document discusses modifications and enhancements to the code generated by gencpp unless otherwise noted.  All changes relating to the generated code will be worked back into gencpp so that gencpp will create the modified files.  To further clarify the difference between gencpp and the code it generates, this report will refer to the generated code as "database code" or “generated database code”.

 

Goals

 

Compilability

The first goal of this project has been to change gencpp in such a way that the code it generates will compile and link without alteration.  Before we began changing the output generated by the current version (ntansala’s version) to support our architectural changes, we needed to debug and fix the current version so that it met this goal.  The most obvious difference between the code generated by chgen and the code generated by gencpp is that chgen generates C code, and gencpp generates C++ code. Since the C language is entirely a subset of C++, porting an application from C to C++ can be as simple as using a C++ compiler instead of a C compiler.  However, C++ requires stricter type checking than C does.  For instance, in C++ a function must have a return value type declared, even if it is void.  C does not have this requirement.  In addition, there are rules in ANSI C++ about how void* can be used that are stricter than those in C.  A list of the changes and bug fixes required to the original gencpp is listed in Appendix A. 

 

Portability

            The second goal of this project has been to have gencpp compile with gcc versions 2.8.1 and 2.9.5, as well as Visual C++ versions 5 and 6.  In addition, this project has also made the generated database code compile with g++ versions 2.8.1 and 2.9.5, as well as Visual C++ versions 5 and 6.  In theory, C and C++ are portable languages such that if the programmer follows ANSI standards, code that will compile on one platform will also compile on any other platform with any other compiler that claims to be ANSI-standard compliant.  Anyone who has written a program longer than twenty lines and has had to compile it on different hardware platforms, or even the same platform and operating system with a different compiler, knows that reality is quite different.  In order for the generated code to be useful in applications written for the Windows environment, such as the Visual C++ port of BDE, it has to compile with Visual C++.  Again, a list of the changes and bug fixes required to the original gencpp is listed in Appendix A.

 

 

Encapsulation

The third goal of this project is to model a relational database table and database row by defining a table class and row class, respectively.  As previously stated, the C language is entirely a subset of C++.  That being said, there are features available in C++ that were not available in C.  Because of its roots in C, C++ isn’t considered to be a pure object oriented language like Smalltalk or even Java.  However, as a hybrid object-oriented language, C++ has many facilities to support the object-oriented paradigm.  The current version of gencpp is a first step in the evolution of chgen into the realm of C++ language support and the additional facilities it contains by creating C++ classes whose instances are rows of a database table.  At this time, these rows are the only objects in the system represented by classes.  This project enhances the database code such that tables are represented by classes as well, and the one to many relationship between these two objects is properly defined and attributed. 

 

Page-Jones (p.9) defines encapsulation as “the grouping of related ideas into one unit, which can thereafter be referred to by a single name”.  That being said, there are operations in the current row class that are more closely related to the table object than the row object.  These operations have been moved from the row class into the table class.  In addition, certain responsibilities pertaining to the relationship between rows and tables have been moved from one type of row object to another.  These will be discussed in greater depth.

 

Information Hiding

The fourth goal of this project is to hide the implementation details of the generated database code, with the idea that the underlying implementation can be replaced at a later time, if desired, with a minimal impact on client code.  The current version of gencpp creates code that starts down the path toward using the most common idioms of object-oriented programming by combining attributes common to every row object into a class and hiding the implementation of these specific attributes behind accessor and modifier methods.  However, the internal member variables of the class that are part of the implementation of the class are publicly visible.[2]  These include the next_ptr and prev_ptr member variables in the row class that are used to implement the table, as well as the <child>_fcp, <child>_bcp members of the parent row class and the <parent>_fpp, <parent>_bpp, and <parent>_pp members of the child row class.  Hiding these implementation details by making them private allows the implementation of the parent-child relationships to be transparent to the client application.  This hiding of the implementation is an important aspect of object-oriented design.

 

Compatibility

The fifth goal of this project was to make our changes in such a way that the database code generated under the new method would be completely compatible with applications written to use the old gencpp API.  This would demonstrate that a complete redesign of the implementation of tables and parent-child relationships within the classes in the database code generated by gencpp could be done in such a way that it is completely transparent to client applications.  While the data types of certain global variables in the database code have changed types, client applications can be programmed so as to make their code entirely compatible with both the new and old versions of gencpp in a transparent fashion.[3]

 

Intrusive vs. Non-intrusive Data Structures

Once the implementation is hidden, we then have the option of changing it.  The database code uses what Soukup would call intrusive data structures, structures that contain pointers within the application objects, to maintain database tables and rows.  In fact, a table is nothing more than a linked list of rows.  One thing that the C++ language provides above and beyond what the C language provides is the Standard Template Library, contributed to the C++ standard by Alexander Stepanov and Meng Lee at Hewlett Packard.  The Standard Template Library provides a set of containers for various purposes, including the list and the vector.  These containers are non-intrusive, in the sense that the internal pointers that connect elements of the container are not embedded within the object itself.  The feasibility of using these non-intrusive containers is discussed below. 

 

There are advantages and disadvantages to both approaches. The current intrusive implementation has the advantage of having less overhead in terms of the individual size of the object.  However, as Soukup himself says (C++ Report, 1998), “Whether the list uses one or two pointers is usually insignificant even in applications where memory space is at a premium, because the objects have other data members, and the addition of one pointer represents only a relatively small increase in object size.  Traversing non-intrusive lists requires an additional pointer jump, which again is insignificant compared to other processing the application would do while traversing a list”. All other things being equal, smaller and faster is always better than bigger and slower, especially in high volume transactions.  Soukup goes on to say that in a particular high-volume model pertaining to large telephone switches, he replaced “non-intrusive data structures with intrusive ones, and the speed almost doubled”.  The reason for this was because “In applications which frequently create and destroy objects, the main overhead comes from the allocation, initialization, and destruction of objects, not from adding and deleting items from the list. Non-intrusive lists include twice as many objects, or require periodic reallocation and copying of the array”.  While this is true, it doesn’t necessarily fit the needs of the applications that use our code today.  The largest current application using the generated database code, the Juice Plant Simulation, contains only 1500 lines of data in all of its tables combined.  Furthermore, the range of our current key structure is a limit on scaling up the size of application (E-mail from R. Lechner to 91.522 class on 11/16/2000).  As far as the overhead of periodic reallocation and copying of the array, the list container from the STL does not have this problem, only the vector does.  Even at that, Prof. Lechner wrote to the current project team in a separate e-mail dated 12/03/2000, “I don't believe copying the entire Vector during resizing is a significant performance hit. Memcpy of the entire array is fast, probably a lot faster than the <pr_create/set_field+/pr_add> incremental construction time (even ignoring parsing) and the cumulative number of copy operations is just the new Vector size (or twice the old size).               

 

Another advantage cited for intrusive data structures over non-intrusive ones is that random insert or remove operations from a doubly linked list are almost instant, but a linear search is necessary for the non-intrusive list.”(Soukup).  This does not apply to our case.  Rows are inserted into database tables in sorted primary key order, so a linear search is required regardless of whether the data structure is intrusive or non-intrusive.  In addition, removal from tables and parent-child lists does not happen at random.  Only particular values are removed, requiring a linear search to find them whether or not the containers are intrusive or non-intrusive.  When talking about insertion, the largest number of insertions happen for our memory-resident database when the application is initialized and loaded from disk.  The current implementation of “pr_load pr_links TOO MUCH, eagerly” (E-mail from R. Lechner to 91.522 class on 11/16/2000).  This is not necessarily a statement about the general debate between intrusive and non-intrusive data structures, but it does speak to the particular application being studied. 

 

In gencpp generated code, the developers of gencpp over the years have been responsible for the maintenance and evolution of the intrusive data structures.  On the other hand, the compiler manufacturers are responsible for providing the implementation of the STL.  This gives the STL a distinct advantage that has nothing to do with performance or even the debate between intrusive and non-intrusive data structures.  By virtue of being a client user of a library, we don’t have to be responsible for the direct manipulation of the pointers in the implementation of our table and our parent-child relationship – the library is responsible for this.  Regardless of the implementation, we can use a full library of code developed and fully tested by professionals before we write one line of code in a single-semester student project.  Given the time restrictions usually imposed upon student projects, the impact of this can not be overstated.  The implementation of the insert_element macro in the current generated database code takes up forty-three lines of pointer manipulation code that is difficult to read.  Our STL container implementation of the table object cuts this down to one very clear line of code.  In addition, a macro can’t be debugged in a source-level debugger such as gdb because it is preprocessed code.  Neither can it be read directly in the case of compiler problems, again because it is not directly in the source code.  Inline template functions can serve the same purpose in C++ as macros in C, but that doesn’t help with the forty-two unnecessary lines of source code to generate and debug.

 

The STL container classes provide another form of abstraction from the internal implementation of the data structure by having a common interface to all container types through common methods and iterators that behave in a syntactically identical fashion.  For example, we can write an implementation today using lists for compatibility reasons.  On the other hand, should we change our minds and choose to use vectors in a year, this change can be made to the entire set of database code simply by replacing the type of variable defined for the container and each iterator.  This is because the operations on the container remain the same regardless of type.  This particular experiment has been done in this project. In addition, the C++ standard requires certain performance guarantees from implementers of the STL.  For instance, a vector is required to provide constant-time access to it’s elements, otherwise it isn’t a valid vector implementation.  (Meyers, p. 229) .

 

In addition, using the STL provides access to a whole set of algorithms dedicated to the manipulation of the containers it provides.  Tom Janzen enumerates some of these in his project report from the Fall semester of 1997.  These algorithms provide a whole range of operations for use with container.  The algorithms, once again, have performance standards imposed on them by the C++ standard.  For example, the stable_sort must perform in O(N log N) time in order to comply with the standard (Meyers, p. 230).

 

From a philosophical point of view, another question arises. In the current intrusive implementation, the member variables in the class and the objects they reference really don’t reflect the real world relationships between the objects in the information model.  For instance, the fkey_fcp and fkey_bcp pointers in a child row object represent a “next sibling” and “previous sibling” relationship between child row elements.  Is this really the relationship being abstracted by the objects?  The purpose of these variables is to reflect a relationship between the child row objects and their parent rows.  The relationship between one child row and the next according to “next sibling” and “previous sibling” is purely coincidental and a consequence of the one to many relationship from the parent to the child relationship.  Normally this would be implemented by having only a referential attribute in the CO object pointing at the single parent SD object.  In this case, that is reflected by the SDid_pp attribute.  However, Tepfenhart says by way of Rumbaugh that “if fast traversal is critical, then it would be better to add attributes to both sides of the association.  In this case, the attribute added to the class on the ‘one’ side is not a buried pointer, but a set of pointers to a class object on the other side”  (UML and C++ [Prentice Hall, 1997] p.259).  In this project, a C++ standard library container class has been used in the parent row to implement this relationship, eliminating the need for previous and next pointers in the child row class.  The direct parent pointer has been maintained. 

 

One should bear in mind that in practical terms, a certain level of intrusiveness is necessary.  For instance, in the case of a parent-child relationship, in order to provide bi-directional navigability between parent and child, the child must have a pointer to the parent.  “Therefore the organization is intrusive at its very heart”.  (Soukup)

[4]

 

In addition, the table as an object does not exist in the current intrusive.  Like the relationship between sibling rows of a common parent, the relationship between a set of rows in the same table is really just a consequence of being multiple objects of the same type which can collectively be called a table.  Currently, the row class is responsible for maintaining the abstraction of the table by maintaining links to the next and previous rows with the next_ptr and prev_ptr attributes.  Like the parent-child relationship, these attributes don’t abstract a real world property of the row.  By implementing the table as a separate object, the row classes in turn are a more direct reflection of the objects they are designed to abstract, the one to many relationship between the table and the rows of data it contains.

 

What this project has done is implement the tables and parent-child relationships as STL list objects.  In the case of the table, it is a globally defined object containing a private list of pointers to rows of the table.  In the case of parent-child relationships, the model suggested by Lee and Tepfenhart cited earlier is followed.  For each foreign key in the child row (the “many” side of the relationship) that points back to the parent, there is a direct pointer back to the parent row (the “one” side of the relationship) for that particular field.  The parent row contains a list of pointers for each foreign key from a child that points back to it.  A simple example from a student database representing the prerequisite relationship between courses is shown below.

 

 

 

 

 

 

 

 

 


0..M

 

0..M

 
 

 

 

 

 


            In this example, suppose that in the associative object prerequisite, COid1 is an advanced course that has a course COid2 that is required as a prerequisite.  In this case, a given course object will contain two lists. The PRid1_children list member will contain pointers to each course that has the given course as a prerequisite, and the PRid2_children list will contain pointers to each course that it is a prerequisite for.

 

            Another advantage if having the container of child lists in the parent object is that the data structure is homogenous in content.  The current implementation, being a circular linked list which contains the parent object in addition to all the child objects, is not.  This causes a problem because the _fpp and _bpp pointers in the child object become type (void*), leading to compilation problems with C++ compilers when the (void*) member is cast or dereferenced.  Since C++ is stricter in its type-checking than C, this is an example of something that used to work with a C compiler that no longer is legal in C++. 

 

 

STL Lists vs. Vectors

 


STL list containers are used for these objects.  We considered the STL vector container, but chose lists because of differences in the implementation between the list and the vector, and compatibility with existing applications.  The analogy between STL lists and vectors is quite similar to the analogy between an array and a dynamically allocated linked list in standard C programming.  A vector represents a contiguous area of memory in which each element is stored in turn.  A list represents noncontiguous memory doubly linked through a pair of pointers to the front and back.  The vector supports very efficient random access because each access is a fixed offset from the beginning of the vector. [5] The list, on the other hand, does not support direct random access to its elements, so retrieval of elements by direct address can be slower through the list class.  It takes the same amount of time to insert at the beginning or end of either a vector or a list.  However, if the insertion needs to happen in other locations, the list is the clear winner over the vector.  This is because in the vector, each element after the inserted element must be recopied and moved, just as if you were inserting a new element into an array where a new slot had to be created for it in the middle.  This presents a much bigger problem for us than just performance.  Suppose we have retrieved a pointer or an STL iterator to an element in the vector.  We then insert a new element at some point in the vector before the element we have stored a reference to.  The insertion will cause our stored element to a new location in memory, causing our pointer or iterator to no longer be valid.  This is extremely problematic for us because table elements are stored in sorted order, meaning that we are likely to insert elements in locations throughout the table at any point in time.  With a list, our pointers and iterators will always remain valid, so we don't have to worry about this.  Each time a new element is inserted into a list, the front and back pointer of the two elements between which the new object is being inserted are reassigned to point to the new object, and the front and back pointers of the new object in turn are initialized to address these two elements.  This problem manifests itself in a much larger scale when the vector resizes itself.  When a vector reaches its capacity, it resizes itself by allocating double its current capacity, copying its current values into the new allocated memory, and freeing the previous memory.   However, the point at which the memory is reallocated is implementation dependent.  The version of the STL that ships with gcc starts with an initial size of 1, and then doubles its size when it reaches its capacity.  The RogueWave implementation starts with a size of 256 for integers, 128 for doubles, and smaller sizes for larger objects, depending on their complexity.  A vector can be resized using its resize() method, or an initial size can be set using the reserve() method, allowing us to keep track of pointers and iterators we know about and updating them when they need to be updated, but we can’t keep track of iterators that the user declares for their own use.  As far as performance goes, even with many random insertions, a vector will generally be faster than a list with small objects, but for a complex object (meaning one with a copy constructor and/or an overloaded assignment operator), a list is faster, as shown in the tables below.

 

Time in Seconds to insert 10,000,000 elements

Data Type

List

Vector

Int

10.38

3.76

Double

10.72

3.95

Simple Class

12.31

5.89

String

14.42

11.80

 

               Time in Seconds to insert 10,000 elements

Data Type

List

Vector

Large Simple Class

0.36

2.23

Large Complex Class

2.37

6.70

 

               This chart is from (Lippman, p.258).  This shows that there are circumstances where a list is clearly preferable in performance to a vector.

Process

 

Bug fixes – Port to Win32/Visual C++

 


            The first step in this project was to get the original gencpp application to compile and run without warnings, as well as to have it generate code that would compile and run without warnings.  In order to be able to work in the graphical environment of Visual C++, this project produced changes such that gencpp would compile and run with this tool.

 

 

Building gencpp on NT

 

            Here are the steps that need to be taken to build gencpp in the Visual C++ environment.  This was done with Visual C++ version 5.0.

 

1)     After invoking the Developer Studio environment, choose File->New

2)     Choose Win32 Console Application

3)     Type in a name for your project.  The location should be the directory which contains the gencpp source, and the Create New Workspace radio button should be selected.

4)     Select Project->Add To Project->Files, and select all of the source files in the directory with the exception of gen_lut.c, hcg_parse.c, lut.c, mystrcpy.c, and strncpy_null.c.  Selecting the .h files is optional.

5)     Select Project->Settings, click the C/C++ tab, and the All Configurations setting in the drop-down box in the upper left corner.  In the Preprocessor Definitions edit box, add “__STDC__” to indicate full conformance with the ANSI C standard.

6)     To build gencpp, choose Build->Rebuild All or Build->Build gencpp.exe

 

 

A list of changes to gencpp necessary to generate database code that compiles and

links with both Visual C++ and g++ is provided in Appendix A.  These are not limited to porting changes.  When this project was started, the generated code would not compile under g++.  The code generated now compiles, although under Visual C++ there are still some warnings about comparisons between signed and unsigned integers.

 

Building Database Applications

 

Visual C++ does not recognize the “.cc” extension as a valid file extension for C++ source files.  The convention in Visual C++ is to use the extension “.cpp”.  Gencpp has been updated to generate .cpp files if the symbol “WIN32” is defined by the compiler.  This symbol is defined on Windows 32 bit platforms under the Visual C++ compiler.  It is not necessarily defined using other compilers even on Windows 32 bit platforms.  For example, one of the authors tested this project on Windows 95 using a port of gcc and g++ to the Win32 platforms.  This was for the purpose of verifying that changes made during the port to Visual C++ left the code in a state such that it would still build and run in a UNIX environment using gcc.  With gcc on Windows 32 bit platforms, “WIN32” is not defined, so the “.cc” file extension is still used as it is on UNIX platforms.

 

In order to leave the generated database code in such a state that it could be tested under both the new and the old methods of storing tables and parent-child relationships,  the database code is generated so that if the “USE_STL” flag is defined, the new implementation of tables and parent-child relationships using STL lists is used.  If this flag is not defined, client applications will use the older intrusive internal representation of tables and parent-child relationships.  If the “USE_STL” flag is intended to be defined when compiling the generated code, the “-nobp” command line option to gencpp is irrelevant, because the forward and backward pointers to parents and children within the row objects are now hidden within the STL list object members of the row classes.

 

The steps necessary to set up a workspace and project in Visual C++ to build a project and workspace for a database client application depends on the nature of that application.  To set up a Win32 Console Application, the same steps can be used for this workspace as were used for gencpp itself.  If you are using the database code with an MFC application or within a dynamic link library, the steps will need to be adjusted accordingly.  As with the previous version of gencpp, “#define MAIN” must be at the top of the file that contains main as its entry point.

 

Under Visual C++, there are still a few warnings about unreferenced local variables in the add_row and link_row class methods of the row class when the USE_STL macro is defined.  This is because these variables are used in these functions when the macro is not defined, and as such needed to be left in the generated code.  Fixes to gencpp in order to have the generated code be compilable under g++ and Visual C++ are included in Appendix A.

 

 

Refactoring database output files

 


The evolutionary path of gencpp from chgen has been one of taking incremental steps from an application generating code in a language that is not object oriented by nature towards generating code in a language that is more object oriented by nature.  The first step was to have the code generate classes for some objects in the system, namely the rows of the table.  This was a big step, and a necessary first step to any other work.  The C code generated by chgen used a series of macros as its API as a means of hiding the implementation details of table objects and relationships, as well as to provide a typeless means of reusing code internally. 

 

            The refactoring of generated output files starts with the row classes.  Please see the specification document for phase two of this project for a complete list of the code changes in the output files.  This document can be found in $CASE/2kf522/kmiu/gencpp_final_2000Dec/doc/STLspec5.doc. Using the data model described above in “STL Lists vs. Vectors”, here are the row class declarations generated using the old method

 

class CO : public RC               /* CO - Course information */

{

private :

    hcg_key DEid;   /* foreign pkey of dept */

    char  name[16]; /* course name */

public :

 

    DE   *DEid_pp;

    void *DEid_fpp;

    void *DEid_bpp;

    PR   *PRid1_fcp;

    PR   *PRid1_bcp;

    PR   *PRid2_fcp;

    PR   *PRid2_bcp;

 

    CO   *prev_ptr;

    CO   *next_ptr;

 

    void set_DEid (hcg_key id);

    void set_name (char  *value);

 

    hcg_key get_DEid ();

    char* get_name ();

 

    CO*  create_row();

    void link_row () ;

    void parse_row (char *buffer, int idx, hcg_key hcg_k) ;

    void add_row (char *viewname) ;

    void delete_row () ;

    void dump_table (char *viewname, char *file_name, int new_version, char *modestr) ;

    void dump_row (char *viewname, char *file_name, int new_version, char *modestr) ;

} ;

 

class PR : public RC               /* PR - Prerequisite */

{

private :

    hcg_key COid1;  /* foreign pkey of course that has this prereq */

    hcg_key COid2;  /* foreign pkey of course that describes this prereq */

    char  test[36]; /* field used for testing only */

public :

 

    CO   *COid1_pp;

    void *COid1_fpp;

    void *COid1_bpp;

 

    CO   *COid2_pp;

    void *COid2_fpp;

    void *COid2_bpp;

 

    PR   *prev_ptr;

    PR   *next_ptr;

 

    void set_COid1 (hcg_key id);

    void set_COid2 (hcg_key id);

    void set_test (char  *value);

 

    hcg_key get_COid1 ();

    hcg_key get_COid2 ();

    char* get_test ();

 

    PR*  create_row();

    void link_row () ;

    void parse_row (char *buffer, int idx, hcg_key hcg_k) ;

    void add_row (char *viewname) ;

    void delete_row () ;

    void dump_table (char *viewname, char *file_name, int new_version, char *modestr) ;

    void dump_row (char *viewname, char *file_name, int new_version, char *modestr) ;

} ;

 

            Regardless of the internal implementation of the tables and the parent-child relationships, that fact that the internal pointers prev_ptr and next_ptr used to implement the tables are public doesn’t conform to the object-oriented concept of information hiding.  In addition, the pointers COid1_pp and COid2_pp used in class PR to point directly back at their parent rows are public.  So are the *_fcp and *_bcp pointers in class CO, and the *_fpp and *_bpp pointers in class PR.  Even if one should choose intrusive data structures over non-intrusive ones, these pointers would have to be made private and wrapped in accessor methods. 

 

            Here is the code for the same two row classes using the new method of storing parent-child relationships and tables externally.  State diagrams describing the behavior of these classes were created for phase two of this project and can be found in $CASE/2kf522/kmiu/gencpp_final_2000Dec/doc/State_diagram_for_gencpp.doc

or $PH/COOL-GEN/gencpp/phase2_state_diagrams.doc

 

 

class CO : public RC

{

private :

    hcg_key DEid;   /* foreign pkey of dept */

    char  name[16]; /* course name */

 

    DE   *DEid_pp;     /* Parent     */

    list<PR *> PRid1_children;  /* Child list */

    list<PR *> PRid2_children;  /* Child list */

 

public :

    void set_DEid (hcg_key id);

    void set_name (char  *value);

 

    hcg_key get_DEid ();

    char* get_name ();

 

    void insertChild_PRid1(PR *childRow);                          /* Adds a child to list */

    void removeChild_PRid1(list<PR*>::iterator rowIter);           /* Remove child         */

    list<PR*>::iterator getFirstChild_PRid1();                     /* Returns 1st child    */

    list<PR*>::iterator getLastChild_PRid1();                      /* Returns last child   */

    list<PR*>::iterator PRid1_Terminator();                        /* Returns terminator   */

    void insertChild_PRid2(PR *childRow);                          /* Adds a child to list */

    void removeChild_PRid2(list<PR*>::iterator rowIter);           /* Remove child         */

    list<PR*>::iterator getFirstChild_PRid2();                     /* Returns 1st child    */

    list<PR*>::iterator getLastChild_PRid2();                      /* Returns last child   */

    list<PR*>::iterator PRid2_Terminator();                        /* Returns terminator   */

 

    void setParent_DEid(DE *currentRow);     /* Initializes parent   */

    DE *getParent_DEid();                    /* Returns parent       */

 

    void link_row () ;

    void parse_row (char *buffer, int idx, hcg_key hcg_k);

    void add_row (char *viewname);

    list<CO*>::iterator delete_row (list<CO*>::iterator rowIter);  /* Deletes the row      */

 

    /* Prints a row to a file */

    void dump_row (char *viewname, char *file_name, int new_version, char *modestr);

 

    /* Prints a row to terminal */

    void dump_row (char *viewname, int new_version, char *modestr);

 

    CO();  /* Constructor */

   ~CO();     /* Destructor */

} ;

 

class COtable         /* CO - Course information */

{

private :

  char       abbrev[5];                          /* 2-char abbreviation                   */

  char      *name;                               /* Table name                            */

  list<CO*>  row;                                /* List of rows                          */

 

public :

  char *getAbbrev();                             /* Returns table abbreviation            */

  char *getName();                               /* Returns table name                    */

 

  unsigned int getRowCount();                    /* Returns number of rows in table       */

  list<CO*>::iterator getFirstRow();             /* Returns iterator to first row         */

  CO *getPrevRow(CO *curr);                      /* Returns prev iterator                 */

  CO *getNextRow(CO *curr);                      /* Returns next iterator                 */

  CO *getLastRow();                              /* Returns last iterator                 */

  list<CO*>::iterator Terminator();              /* Returns list terminator               */

  bool isFirstRow(CO * curr);                    /* True if curr is iterator to first row */

  bool isLastRow(CO * curr);                     /* True if curr is iterator to last row  */

 

  void insertRow(CO *newRow);                    /* Inserts a row at end of table         */

  list<CO*>::iterator removeRow(list<CO*>::iterator i); /* Removes row from table         */

 

  /* Prints out entire table to file */

  void dump_table(char *viewname, char *file_name, int new_version, char *modestr);

 

  /* Prints out entire table to terminal */

  /*  void dump_table(char *viewname, int new_version, char *modestr); */

 

  COtable(char *abbrev, char *name);             /* Constructor                           */

 ~COtable();                                     /* Destructor                            */

};

 

class PR : public RC

{

private :

    hcg_key COid1;  /* foreign pkey of course that has this prereq */

    hcg_key COid2;  /* foreign pkey of course that describes this prereq */

    char  test[36]; /* field used for testing only */

 

    CO   *COid1_pp;     /* Parent     */

 

    CO   *COid2_pp;     /* Parent     */

 

public :

    void set_COid1 (hcg_key id);

    void set_COid2 (hcg_key id);

    void set_test (char  *value);

 

    hcg_key get_COid1 ();

    hcg_key get_COid2 ();

    char* get_test ();

 

 

    void setParent_COid1(CO *currentRow);     /* Initializes parent   */

    CO *getParent_COid1();                    /* Returns parent       */

 

    void setParent_COid2(CO *currentRow);     /* Initializes parent   */

    CO *getParent_COid2();                    /* Returns parent       */

 

    void link_row () ;

    void parse_row (char *buffer, int idx, hcg_key hcg_k);

    void add_row (char *viewname);

    list<PR*>::iterator delete_row (list<PR*>::iterator rowIter);  /* Deletes the row      */

 

    /* Prints a row to a file */

    void dump_row (char *viewname, char *file_name, int new_version, char *modestr);

 

    /* Prints a row to terminal */

    void dump_row (char *viewname, int new_version, char *modestr);

 

    PR();  /* Constructor */

   ~PR();     /* Destructor */

} ;

 

class PRtable         /* PR - Prerequisite */

{

private :

  char       abbrev[5];                          /* 2-char abbreviation                   */

  char      *name;                               /* Table name                            */

  list<PR*>  row;                                /* List of rows                          */

 

public :

  char *getAbbrev();                             /* Returns table abbreviation            */

  char *getName();                               /* Returns table name                    */

 

  unsigned int getRowCount();                    /* Returns number of rows in table       */

  list<PR*>::iterator getFirstRow();             /* Returns iterator to first row         */

  PR *getPrevRow(PR *curr);                      /* Returns prev iterator                 */

  PR *getNextRow(PR *curr);                      /* Returns next iterator                 */

  PR *getLastRow();                              /* Returns last iterator                 */

  list<PR*>::iterator Terminator();              /* Returns list terminator               */

  bool isFirstRow(PR * curr);                    /* True if curr is iterator to first row */

  bool isLastRow(PR * curr);                     /* True if curr is iterator to last row  */

 

  void insertRow(PR *newRow);                    /* Inserts a row at end of table         */

  list<PR*>::iterator removeRow(list<PR*>::iterator i); /* Removes row from table         */

 

   void dump_table(char *viewname, char *file_name, int new_version, char *modestr);

 

  /* Prints out entire table to terminal */

 

 

/* Prints out entire table to file [6] */

 

  /*  void dump_table(char *viewname, int new_version, char *modestr); */

 

  PRtable(char *abbrev, char *name);             /* Constructor                           */

 ~PRtable();                                     /* Destructor                            */

};

 

     First of all, the table classes for each type of table are now separate classes to support the concept of the table as an external container.  The most interesting member variable is the list<PR*> row member variable which replaces the next_ptr and prev_ptr member variables.  In addition to holding pointers to all of the rows and consolidating these pointers in one place, the list<PR*> row member variable is private to the table class.  If one wishes to access a row of the table, a message must be sent to the table to retrieve an iterator to the row.

 

            The original generated database code contained the following declarations of global variables:

 

class PR *PRbegin, *PRcurr, *PRtemp, *PRend, *PRelt, *PRcurr2, *PRtemp2;

 

     These pointers have been replaced with STL iterator objects in order to assist in navigating the list containers.

 

list<PR*>::iterator PRcurr, PRtemp, PRelt, PRcurr2, PRtemp2;

 

     An iterator to an STL container is more than just a pointer - it is an object in its own right that can be constructed, assigned to, and copied.  It contains overloaded operators so that it can be dereferenced in an identical fashion to a pointer.  It can also be incremented and decremented the same way a pointer can.  The goal of an iterator is to be able to function as a pointer can, but to provide additional functionality specific to the STL container idiom.  This can be demonstrated by looking at the new version of the child_loop macro that is generated as compared to the old one.

 

     Here is the old version of the macro.  Notice that in order to increment the child##curr pointer to the next child, the application needs to know the name of the foreign key field that is being followed.

 

#ifndef USE_STL

#define child_loop(parent,child,pkey,ckey) \

  for (child##curr = parent##curr->pkey##_fcp; \

       ((child##curr != NULL) && ((parent *) child##curr) != (parent##curr)); \

       child##curr = (child *) child##curr->ckey##_fpp)

#endif /* USE_STL */

 

            Below are the macros which comprise the same child loop in the new version.  Notice how the next_child macro does not need to be told when it increments child##curr which list it is following.  This is because when an iterator is assigned to, it contains a reference to the specific container instance that it is following.  In our case, if  start_child_chain initializes PRcurr to follow

COcurr->getFirstChild_PRid1, then COcurr++ understands that it needs to point to the next element of the PRid1_children container. 

 

#undef start_child_chain

#define start_child_chain(parent,child,fkey) \

    child##curr = (*parent##curr)->getFirstChild_##fkey()

 

#undef next_child

#define next_child(child,fkey) \

     child##curr++

 

#undef child_loop

#define child_loop(parent,child,pkey,ckey) \

    for(start_child_chain(parent,child,pkey); child##curr\ != (*parent##curr)->pkey##_Terminator(); \

         next_child(child,pkey))

 

            This added functionality and the ability of the iterator to retain state within the container that it points to was the biggest reason we changed the type of the global cursors from pointers to iterators.  By making the data structure non-intrusive, a pointer to a row object can no longer obtain the location of the next row within that table or the next sibling of a common parent row, because the row itself no longer contains this information.  As a consequence, to get the next row of a table from only a pointer to a row object, one would have to search the appropriate container in a linear fashion to obtain an iterator to the list node that contains the pointer to the row object, and then increment that iterator.  This made keeping the global variables as type “pointer to row” prohibitively expensive from a performance point of view, so we did not implement complete backward compatibility in our generated database code.  Such a version is possible, however.  A simple test application using this methodology is in the $CASE directory subtree.  Its location will be listed in Appendix B.

 

Container operations

 


            When discussing the generated database methods, there are three general cases to consider.

1)     Insertion

2)     Deletion

3)     Traversal

 

These three cases have two separate subcases to consider.  The first is the operation on a row as an element of a table container.  The second is the operation on a row as an element of a child list container.

 

Inserting a row into a table container is done in two ways.  When an application is initialized, pr_load is called.  Pr_load calls a function called load_data, which then calls the row object’s link_row method.  When a row is added after initialization, the application calls the add_row method of the row object, which does many of the same things that load_data does in a type-specific way, and then also calls the row object’s link_row method.

The link row method can have two basic forms, depending on whether or not the element being inserted is a parent row or a child row.  In our running example, the two link row methods look as follows:

 

// This is the child row object

void PR::link_row()

{

 hcg_key PRid;

 

#ifndef USE_STL

     insert_element(PR);

#else

PRtab->insertRow(this);

#endif /* USE_STL */

 

     link_parent_bp_m(PR,CO,COid1,COid,COid1,PRid1);

     link_parent_bp_m(PR,CO,COid2,COid,COid2,PRid2);

}

 

 

 

// This is the parent row

void CO::link_row()

{

 hcg_key COid;

 

#ifndef USE_STL

     insert_element(CO);

#else

COtab->insertRow(this);

#endif /* USE_STL */

 

     link_child_bp_m(PR,CO,COid1,COid,COid1,PRid1);

     link_child_bp_m(PR,CO,COid2,COid,COid2,PRid2);

     link_parent_bp_m(CO,DE,DEid,DEid,DEid,COid);

}

 

     Insertion becomes much simpler when using the list containers.  The 43 lines of pointer manipulation code in the macro insert_element that are used under the old scheme are replaced by a call to the row’s insertRow method.  The code for this method is straightforward.  This code simply walks the row list and inserts the entry into the row in sorted order. [7]

 

 

void PRtable::insertRow(PR *newRow)

{

  list<PR*>::iterator iter;

 

  iter = row.begin();

  while (iter != row.end() && (*iter)->get_pkid() < newRow->get_pkid())

    iter++;

 

  row.insert(iter, newRow);

}

 

           

           When a new row is inserted, it will call the macro link_child_bp_m to connect itself to all of its children, and link_parent_bp_m to connect a row to its parents.   Since the type of the container and number of pointers are transparent to the user, the twelve macros link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m

have been reduced to link_child_nobp_m, link_parent_nobp_m, unlink_parent_nobp_m, and unlink_child_nobp_m.  The rest have been #define d to these four.  The link_child_bp_m macro looks like this.

 

#define link_child_bp_m(ca,pa,cf,pk,p,c) \

ca##curr2 = ca##tab->getFirstRow(); \

while(ca##curr2 != ca##tab->Terminator()) \

{ \

  if (key_compare1((*ca##curr2)->get_##cf(),this->get_pkid()) == 0) \

  { \

    (*ca##curr2)->setParent_##p(this);\

    insertChild_##c((*ca##curr2)); \

  } \

  ca##curr2++; \

}

 

            The key element here is the insertChild_##c call, which adds a pointer to the child value to the container for the correct key field pointing upward from the child to the parent’s primary key.  Before doing this, the instance set the child row’s parent pointer to point at itself.  The naming convention for member variables used in the old code is maintained to determine which set of field values need to be compared for insertion into which childlist.  Notice that for compatibility with existing applications, the current interface to the macros is maintained even if this leaves some of the arguments unused.

 

            The code for linking a row to all of its parent rows from the perspective of the child looks like this:

 

/* We use 'this' ptr in place of ca##curr */

#define link_parent_bp_m(ca,pa,cf,pk,p,c) \

for (pa##curr2 = pa##tab->getFirstRow(); pa##curr2 != pa##tab->Terminator() && (key_compare1((*pa##curr2)->get_pkid(),this->get_##cf()) < 0); pa##curr2++); \

if ((pa##curr2 != pa##tab->Terminator()) && (key_compare1(this->get_##cf(),(*pa##curr2)->get_pkid()) == 0)) \

{ \

  this->setParent_##p((*pa##curr2)); \

  (*pa##curr2)->insertChild_##c(this); \

}

 

     This code walks the parent table until it finds the primary key which matches the given foreign key field in our table.  When it finds the row, which is in sorted order in the table, the operation is the same from the child perspective as it was from the parent perspective, except the roles are reversed.  The child sets his own parent pointer to point at the parent he has found, and then sends a message to the parent to insert it into the parent’s corresponding child list.

 

            Traversal starts with getting an iterator to the first element of the container in question, whether it is a table container or a list of children corresponding to a foreign key in the child row.  It then increments by incrementing the iterator, and then terminates upon reaching the end of the container.  For table containers, the table_loop macro traverses the entire table. [8]

 

#define table_loop_all(tbl)   for (tbl##curr = tbl##tab->getFirstRow(); tbl##curr != tbl##tab->Terminator(); tbl##curr++) 

 

            The getFirstRow() method of the table class simply returns row.begin(), and the Terminator() method returns row.end().  The begin() and end() methods are methods common to any STL container, whether it is a vector, list, or even a string.  They return an iterator to the given location.  Users should be aware that the end() method of an STL container actually points at a sentinel value one element past the actual end of the container.  If this entry is being translated into a pointer, the user must make certain to set the pointer value of the iterator returned from an end() call to null, because the compiler may not do so.  Incrementing the iterator is simple because once it has been assigned to an element from a particular container, it retains the context of that container, so incrementing the value automatically returns the next value from the correct list, be it a child list or a table list.

 

            Traversing a child list is just as easy.  The following macros are used to traverse child lists:

 

#define start_child_chain(parent,child,fkey) \

    child##curr = (*parent##curr)->getFirstChild_##fkey()

 

#define next_child(child,fkey) \

child##curr++

 

#define child_loop(parent,child,pkey,ckey) \

    for(start_child_chain(parent,child,pkey); child##curr != (*parent##curr)->pkey##_Terminator(); \

         next_child(child,pkey))

 

     The logic is exactly the same as it is for traversing the table, with the additional specification in the third argument of the suffix to the child list to follow, and the specification in the fourth argument of the name of the foreign key in the child that points upwards towards the parent.

 

            Removal of a row from a table is done by calling the row’s delete_row() method. The delete_row method looks like this:

 

 

list<CO*>::iterator CO::delete_row (list<CO*>::iterator rowIter)

{

 static char rcsid[] = "$Id";

     char temp_key[HCG_KEY_SIZE+1];

 hcg_key COid;

 

 

     unlink_child_bp_m(PR,CO,COid1,PRid1);

     unlink_child_bp_m(PR,CO,COid2,PRid2);

COcurr = rowIter;

     unlink_parent_bp_m(CO,DE,DEid,COid);

     #ifdef DEBUG

          COid = get_pkid();

     decode(temp_key, &COid);

          printf("The primary key deleted is %s\n",temp_key);

     #endif

     rowIter = COtab->removeRow(rowIter);

     delete this;

return rowIter;

}

 

     The key sections here are the two calls to unlink_child_bp_m, the call to unlink_parent_bp_m, and the call to the table’s removeRow method.  The call to removeRow simply removes the row pointer from the table’s list of row pointers.  The unlink_child_bp_m and unlink_parent_bp_m macros looks like this:

 

#define unlink_child_bp_m(ca,pa,p,c) \

while(this->getFirstChild_##c() != this->c##_Terminator()) \

{ \

      (*getFirstChild_##c())->setParent_##p##(NULL); \

      removeChild_##c( getFirstChild_##c() ) ; \

}

 

/* ca##curr is an iterator that is set in ca::delete_row */

#define unlink_child_bp_m (ca,pa,p,c) \

if(getParent_##p() != NULL) \

(getParent_##p())->removeChild_##c(ca##curr);

 

     The unlink_child_bp_m macro simply keeps removing the first element from the appropriate child container in the parent until it is empty after setting its appropriate parent pointer to null.  The unlink_child_bp_m macro calls the parent row object’s removeChild_c## method to send the actual private list object a remove message.  The remove method of an STL container object takes care of the searching for values and removes all instances of a given value in a container. [9]

 

 

Changes to macros and the API

 


For the most part, we were able to implement all of the changes in our phase 2 specification.  However, there are four important notes to make.

 

1) The third argument for the last_child macro changed from “fkey that points upwards from child to parent” to “fkey that points downward from parent to child”

2) XXbegin and XXend were not defined to call the getFirstRow and getLastRow methods of the class as specified in the phase two specification because they need to remain values that can be assigned to.  Therefore, they are now list iterators as the rest of the macros are.

3) The access macros (XX__YY, YY__ZZ) were not implemented due to time constraints

4) The del_row, insert_element, and free_table macros are obsolete and has been eliminated.

In order for the generated macros to support the new method of organizing table and row classes, any macros that referred to the former public embedded pointers were rewritten in order to refer to new table and row class methods.  In addition, all references to the variables that were formerly referred to as global pointers to row objects were rewritten to use syntax supported by the new declaration of these variables as STL list iterators.  The following macros were modified because of their direct reference to XX::next_ptr and XX::prev_ptr.

 

next_all, pr_find, find_str_loop_all, find_str_loop, find_key_loop_all, find_key_loop, find_fkey_loop_all, find_fkey_loop, find_int_loop_all, find_int_loop, table_loop_all, table_loop, pr_var_find, var_find_str_loop_all, var_find_str_loop, var_find_int_loop_all, var_find_int_loop, var_table_loop_all, var_table_loop, insert_element, link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, del_row (obsolete), and free_table (obsolete).

 

 

All references to parent pointers *_pp must be modified to call XX::getParent_ZZid() or XX::setParent_ZZid() depending on the specified action.  The following macros were modified for this purpose:

 

link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m.

 

 

As was mentioned previously, these twelve macros were condensed down to link_child_bp_m, link_parent_bp_m, unlink_child_bp_m, and unlink_parent_bp_m Any macro that uses *_fpp or *_bpp was modified to use the child-row iterators in the row class:

 

next_child, last_child, child_loop, var_child_loop, var_last_child, link_child_bp_m, link_child_nobp_m, link_parent_bp_m, link_parent_nobp_m, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_bp_m, and unlink_child_nobp_m.

 

 

Any macro that uses *_bcp or *_fcp was modified to use the child-row iterators in the row class:

 

start_child_chain, first_child, child_loop, var_child_loop, var_first_child, link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m.

 

 

The following macros also needed to be modified:

 

child_exists:  There is no longer a (struct parent *).  The parent is a class and it is type XXtable, not XX.

 

exists:  This macro uses XXcurr to check if the table has any rows.  However, using the row iterator in XXtable, we can check the row size to verify that the table contains rows.

 

pr_check_str:  This macro attempts to access a row's private data, and it needs to be modified to call the row's get* method.

 

The access macros will need to be modified to call the row's getParent_YYid() method.  For example:

 

#define XX__YY XX->getParent_YYid()

#define ZZ__YY ZZ->getParent_XXid()->getParent_YYid()

 

This work to change the access macros defined above was not completed.

 

Obsolete Functions/Methods/Macros

 

pr_parse():  The code from pr_parse() has been modified and moved into load_data(), as described above, making pr_parse() obsolete.

 

pr_link():  pr_link() is called only by load_data().  Moving its code into load_data(), renders it obsolete.  Note that the only thing pr_link() does is call XX::link_row().

 

XX::create_row():  The new operator calls XXrow's constructor making create_row() obsolete.  (See load_data() for an example.)

 

del_row:  This macro is replaced by XX::removeRow() followed by the delete operator (see XXrow::delete_row()).

 

pr_del_bt(): This function is called only by del_row() which is obsolete.  Hence, this function is also obsolete.  Bear in mind that the rows are now stored in STL containers not btrees as this function suggests.  Since pr_del_bt() is the only function in pr_delete.cpp, this file does not need to be generated by gencpp.

 

free_table:  This macro is obsolete.

 

pr_get_int, pr_get_flt, pr_get_str, and pr_get_key:  These four macros have been obsolete since the original conversion of chgen to gencpp.  However, they were never removed.

 

pr_delete:  This macro has been obsolete since the original conversion of chgen to gencpp.

 

pr_dump_row:  This macro calls dump_row().  However, the dump_row() function does not exist in the generated files.  It does, however, exist in the gencpp code.  Regardless, there is no reason to continue writing this macro to the application's *.h file.

 

alloc_element:  This macro is no longer needed now that we use the new operator to allocate space for a row.  However, it is used in gencpp itself.

 

link_child_bp_m, unlink_child_bp_m, link_parent_bp_m, and unlink_parent_bp_m:  All link and unlink macros that implement backpointers are obsolete.

Testing

 

            Testing of this application was primarily centered around making certain that deleting rows from a table maintained the proper links between parents and children.  Another concern of ours was that we wanted to make sure that in a scenario such as the one used as an example throughout this paper where a child row had two different parent rows, particularly of the same type, that we would be able to maintain these relationships correctly.

 

            The schema used for our tests is pictured below:

 

 

 

 

 

 

 

 

 

 


 

 

0..M

 
 

 

 


                                                                                                                                   

 

 

 

 

 

 

 

 

 

 

 

 



            In this schema, a department at a school can offer zero to many courses, but a course can be offered by exactly one department.   A Prerequisite is an associative object representing the many to many relationship between two courses such that COid1 points to an advanced course which has a given COid2 as a basic prerequisite.  In addition to these two tables, two simple tables were used to test error conditions.  One of these tables is empty, and the other contains one row and has no parents or children.  Here is the schema definition file used:

 

/********************************************************************/

/* File: test_course_prereq.sch                                     */

/* Schema file used for testing STL list in gencpp                  */

/* Written by Katherine Miu & Mark Ottesen for 91.522 on 12/16/2000 */

/*                                                                  */

/********************************************************************/

 

empty EM       /* EM - Empty table for testing only */

{

EMid  NA c8 1  /* unique pkey of empty */

test  NA c8 0  /* test field */

}

 

onerow ON     /* ON - table containing only one row for testing only */

{

ONid NA c8  1 /* unique pkey of onerow */

name NA c11 0 /* testing only */

}

 

dept DE        /* DE - University Department */

{

DEid NA c8  1 /* unique pkey of dept */

name NA c20 0 /* dept name */

}

 

course  CO       /* CO - Course information */

{

COid       NA  c8  1  /* unique pkey of course */

DEid       NA  c8  1  /* foreign pkey of dept */

name       NA  c15 0  /* course name */

}

 

prereq PR     /* PR - Prerequisite */

{

PRid  NA c8  1 /* unique pkey of prereq */

COid1 NA c8  1 /* foreign pkey of course that has this prereq */

COid2 NA c8  1 /* foreign pkey of course that describes this prereq */

test  NA t35 0 /* field used for testing only */

}

 

            The main program that was used for testing tests the following activities:

 

1) Loading a database from disk to test the logic of building a set of tables

2) Adding a new child row to the prerequisite object, which tests all of the row construction logic, and all the parent-child linking logic

3) Looping through all five tables using table_loop to verify the results

4) Testing the ability to loop through both sets of children of a given parent to determine integrity of parent child relationships. 

5) Tests our ability to determine the first and last elements of a container correctly.

6) Deleting a child row to correctly determine our ability to unlink a row from its parent rows as well as the table container.  This also tests our row destruction capability

7) Tests our ability to find rows that have particular values by using find_str_loop

8) Verifies the integrity of child lists after deleting a row.

 

 

/*********************************************************************/

/*                                                                       */

/* test_co_pr.cc:  Test file for STL modifications                       */

/*                                                                       */

/* Written by Mark Ottesen and Kathy Miu for 2kf522                      */

/*                                                                       */

/*********************************************************************/

 

#define MAIN

#include <string.h>

#include "test_co_pr.h"

#include "EMschema.h"

#include "ONschema.h"

#include "DEschema.h"

#include "COschema.h"

#include "PRschema.h"

 

/* The following defines are only necessary if attempting to use

** this file when USE_STL is undefined--i.e. to run the older version

** of gencpp.

*/

#ifndef USE_STL

#define generic_EMcurr EMcurr

#define generic_ONcurr ONcurr

#define generic_DEcurr DEcurr

#define generic_COcurr COcurr

#define generic_PRcurr PRcurr

#else

#define generic_EMcurr (*EMcurr)

#define generic_ONcurr (*ONcurr)

#define generic_DEcurr (*DEcurr)

#define generic_COcurr (*COcurr)

#define generic_PRcurr (*PRcurr)

#endif

 

//#include <string>

//#include <list>

//#include <iostream>

 

//using namespace std;

 

 

//void StripTrailingBlanks(string& mutant)

//{

//    int lnspace = mutant.find_last_not_of(' ');

//    if(lnspace != string::npos)

//    {

//    mutant.erase(lnspace + 1, mutant.size());

//    }

//   

//}

   

int main()

{

  PR *newPRrow;

 

#ifdef USE_STL

  printf("\n\n****USING STL CONTAINERS FOR TABLES AND ROWS--KMIU&MOTTESEN****\n\n");

#else

  printf("\n\n****USING NTANSALA GENCPP****\n\n");

#endif

 

  pr_init("test_co_pr.vdf", "test_co_pr.dat");

  pr_load("DefaultView", "test_co_pr.dat");

  pr_dump("DefaultView", "test_co_pr.out", 0, "w");

 

printf("\n*** TEST OF INSERTROW() & ADD_ROW & PR_CHECK_FKEY & PR_CHECK_STR ***\n");

#ifdef USE_STL

    newPRrow = new PR;

#else

    newPRrow = newPRrow->create_row();  /* class method */

#endif

//    newPRrow->set_test("INSERT: Acc04's prereq is Acc03");

    newPRrow->add_row("DefaultView");

 

 

printf("\n*************** TEST OF TABLE_LOOP **************\n");

 

    table_loop("DefaultView", EM)  /* on empty table */

    {

    }

    table_loop("DefaultView", ON)  /* on one-row table */

    {

      printf("ON: name is *%s*\n",generic_ONcurr->get_name());

    }

    table_loop("DefaultView", DE) /* dept table */

    {

      printf("DE: dept name is *%s*\n",generic_DEcurr->get_name());

    }

    table_loop("DefaultView", CO) /* course table */

    {

      printf("CO: course name is *%s*\n",generic_COcurr->get_name());

    }

    table_loop("DefaultView", PR) /* prereq table */

    {

      printf("PR: prereq test field is *%s*\n",generic_PRcurr->get_test());

    }

 

printf("\n************* TEST CHILD_LOOP & FIRST_CHILD & LAST_CHILD ***********\n");

 

    table_loop("DefaultView", DE)

    {

      child_loop(DE,CO,COid,DEid)

      {

        if (first_child(DE,CO,COid))

        {

          printf("\nCO children of DE %s (first_child)\n\n",generic_DEcurr->get_name());

        }

        printf("   %s\n",generic_COcurr->get_name());

#ifdef USE_STL

        if (last_child(DE,CO,COid))

#else

        if (last_child(DE,CO,DEid))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

    }

    table_loop("DefaultView", CO)

    {

      child_loop(CO,PR,PRid1,COid1)

      {

        if (first_child(CO,PR,PRid1))

        {

          printf("\nPRid1 children of CO %s (first_child)\n\n",generic_COcurr->get_name());

        }

        printf("    %s\n",generic_PRcurr->get_test());

#ifdef USE_STL

        if (last_child(CO,PR,PRid1))

#else

        if (last_child(CO,PR,COid1))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

 

      child_loop(CO,PR,PRid2,COid2)

      {

        if (first_child(CO,PR,PRid2))

        {

          printf("\nPRid2 children of CO %s (first_child)\n\n",                     generic_COcurr->get_name());

        }

        printf("    %s\n",generic_PRcurr->get_test());

#ifdef USE_STL

        if (last_child(CO,PR,PRid2))

#else

        if (last_child(CO,PR,COid2))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

 

    }

    printf("\n** TEST FIND_STR_LOOP & DELETE_ROW & REMOVEROW()**\n");

    find_str_loop("DefaultView", PR, test, "????????????????")

    {

#ifdef USE_STL

      PRcurr = generic_PRcurr->delete_row(PRcurr);

#else

      /* PRcurr->delete_row(); */

      /* This is a bug in the ntansala gencpp because delete_row()

       *   will free PRcurr which is the looping var used by

       *   find_str_loop

       */

      PR *myPRcurr = new PR;                 /* kludge */

      myPRcurr->next_ptr = generic_PRcurr->next_ptr; /* kludge */

      generic_PRcurr->delete_row();

      generic_PRcurr = myPRcurr;                     /* kludge */

#endif

    }

    table_loop("DefaultView", PR) /* prereq table */

    {

      printf("PR: prereq test field is *%s*\n",

            generic_PRcurr->get_test());

    }

 

    /* delete first CO row and set COcurr to next row */

#ifdef USE_STL

    COcurr = COtab->getFirstRow();

    COcurr = generic_COcurr->delete_row(COcurr);

    if (COcurr != COtab->Terminator())

#else

    generic_COcurr = CObegin;

    generic_COcurr->delete_row();

    generic_COcurr = CObegin;

    if (generic_COcurr != NULL)

#endif

      printf("CO name is *%s*\n",generic_COcurr->get_name()); 

            /* part of preceding if stmt */

 

    table_loop("DefaultView", CO) /* course table */

    {

      printf("CO: course name is *%s*\n",generic_COcurr->get_name());

    }

   

printf("\n*** VERIFY CHILD LISTS AFTER DELETION ***\n");

    table_loop("DefaultView", DE)

    {

      child_loop(DE,CO,COid,DEid)

      {

        if (first_child(DE,CO,COid))

        {

          printf("\nCO children of DE %s (first_child)\n\n",                              generic_DEcurr->get_name());

        }

        printf("   %s\n",generic_COcurr->get_name());

#ifdef USE_STL

        if (last_child(DE,CO,COid))

#else

        if (last_child(DE,CO,DEid))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

    }

    table_loop("DefaultView", CO)

    {

      child_loop(CO,PR,PRid1,COid1)

      {

        if (first_child(CO,PR,PRid1))

        {

          printf("\nPRid1 children of CO %s (first_child)\n\n",

                  generic_COcurr->get_name());

        }

        printf("    %s\n",generic_PRcurr->get_test());

#ifdef USE_STL

        if (last_child(CO,PR,PRid1))

#else

        if (last_child(CO,PR,COid1))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

 

      child_loop(CO,PR,PRid2,COid2)

      {

        if (first_child(CO,PR,PRid2))

        {

          printf("\nPRid2 children of CO %s (first_child)\n\n",                           generic_COcurr->get_name());

        }

        printf("    %s\n",generic_PRcurr->get_test());

#ifdef USE_STL

        if (last_child(CO,PR,PRid2))

#else

        if (last_child(CO,PR,COid2))

#endif

        {

          printf("Done (last_child).\n");

        }

      }

 

    }

  return(0);

}

 

 

Future Work

 

            The evolution of gencpp seems to be moving toward generating more object-oriented code, and code that is more idiomatic of the C++ language.  That being said, many of the macros that currently constitute the API can be rewritten as template functions.  There are those, such as child_loop and table_loop, which are not actually functions but merely loop initialization statements.  These would not necessarily be good candidates.  However, there are some that can be reworked in this fashion.

 

            The table class seems to me to be an ideal candidate for becoming a template class.  In this implementation, there are no operations for a particular table type which are specialized for that particular table. [10]

 

            Another possibility for future work would be to investigate ways to make the syntax for dereferencing the global iterators a bit less messy.  For one thing, having to dereference a pointer like this all the time

 

            COcurr = (*COcurr)->delete_row(COcurr);

 

can be a bit awkward.  A partial solution is at the top of the main test file, but it would be nice to have a more elegant solution.  One possibility would be to have the nodes of the table and parent-child lists use a reference-counted implementation of row objects wrapped in an outer class responsible for managing the references.  James Coplein, in his book Advanced C++: Programming Styles and Idioms refers to this as a “singleton letter class”. 

 

            This project would have been easier had the API macros enjoyed a further separation from the internals of the classes than they did when we started.  There still is work to do in that regard.  For instance, it used to be that when the file was loaded into memory from disk, no operations of the row classes were invoked.  We have made changes in this direction.  Furthermore, the row classes actually make calls out to macros to facilitate in the loading and unloading of data.  We have changed the code so that adding a row no longer calls insert_element.  However, it still calls out to link_parent_bp_m and link_child_bp_m to link a row to its parents and children.  It would be nice to see the classes be able to operate independently of the macros generated in the database code.  This way, the macros would be purely there as a convenience to client API programmers.  It seems to me that over time the need for these macros will disappear as the row and table classes develop cleaner interfaces for client API programmers over time.

 

ADDENDUM to Future Work (added by RJL, 0061210): 

 

The present gencpp implementation provides two extreme alternatives: Either a STL Vector for each table and a STL List for each 1 to many parent-child list or 1 to 0..1 super-to-subclass object delegation, or intrusive linked lists for both of these relationships. The best implementation  is a (new) STL Vector for each table (like gencpp -USE_STL) and an intensive linked list for 1 to many or 1 to optional singleton parent-child associations (like gencpp).

 

A Vector for each table is justified since pkeys are never reused and pkeys are always allocated in ascending order. Indexed access will work if [gaps are not allowed in the sequence of objects (e.g. by only marking them deleted, not actually deallocating them). An STL container is too much overhead for parent-child associations;. Intrusive list pointers can (and should) be encapsulated behind class methods for traversal and maintenance.  These methods should be made more efficient by pointer cache-ing or  indexed access.

 

File $PH/06f522/exam2/06f522exam2.doc discusses some other TBD changes, such as:

(1)       Re-implementation of this->get or set fieldname functions along the lines of chgen's pr_accessors.c which uses template-,like macros to generate class methods of the row-class, and  do NOT call the deprecated macros pr_set_<type>(pkey,fieldname, fieldvalue) internally.

 

(2)       A safe implementation of this->set_<fkeyname>  methods. These must call a pr_unlink_* method to unlink the child from the old parent and pr_link_* to relink it to the new parent of an existing child-class object whenever an fkey is updated inside it.   See Problem 1 of $PH/06f522/exam2/06f522exam2.doc.

            Note 2a: an STL implementation would also require methods to be written for removal from the prior parent's child-set and insertion into the new child-set without changing the object's other properties.

            Note 2b: pkeys of existing objects should never be updated:  pkeys are a critical resource whose allocation is serialized for concurrent applications.

            Note 2c: The primary use for an fkey update method is to change the fkey STid when a state transition is made in the COOL-LCP framework component.

 

(3)       Merging of class and instance  levels of object definitions. Declare object container access methods and private metadata (TT-row and its TA-child-set) as 'static' or class-level properties; State Models (SM-rows and child-components) can also be declared static properties of their application subclasses. See $PH/06f522/exam2/06f522exam2.doc Problem 4.


Appendix A - Porting Changes and Bug Fixes to gencpp

 

In order to get gencpp to build with Visual C++, the following fixes needed to be

made to the gencpp source:

 

 

 
 
General Fixes to gencpp Files

 

1.     chgen_define.h:  Added outfile_ext to hold "cc" for unix output files and "cpp" for WNT output files.

 

2.     chgen_define.h:  Moved hardcoded filenames (like pr_load.cc) from open_files.c.  Added filenames for WNT's *.cpp files (like pr_load.cpp).

 

3.     close_files.c:  Added " if (cli_log == TRUE)" before closing prlog_fp.  Without the if statement, closing prlog_fp when it isn't open causes an unhandled exception.

 

Compiler Errors and Warnings when Compiling gencpp Files

 

4.     chgen_externs.h:  C1083: Cannot open include file: 'strings.h': No such file or directory - This file does not exist on NT.

Solution:  Use #ifdef WIN32  to include string.h on NT platforms.

 

5.     chgen.c:  C4035: 'main' : no return value

      Solution:  Set return type of main to "int" and return 0.

 

6.     gen_load_data.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int 

    Solution:  Include prototypes.h.

 

7.     gen_load_data.c:  C4101: 'unpad' : unreferenced local variable 

Solution:  Remove this variable declaration.  Also, remove foundnum, parent_pkey, i,         and j in gen_pr_parse().

 

8.     lut_h_file.h:  C2026: string too big, trailing characters truncated

Problem:  p_lut_h static string cannot hold more than 2048 characters on WNT platform.

Solution:  Split p_lut_h into p_lut_h and p_lut_h2.  Add fprintf statement to print out p_lut_h2 in gen_defines.c and gen_lut.c.

 

9.     gen_defines.c:  C4129: ' ' : unrecognized character escape sequence

Solution:  Move embedded "\n\" to end of line:  "/*    extern void log_pr_add (char *, char *, hcg_ptr ); \n\ ntansala */ in gen_defines().

 

10.  gen_defines.c:  C2026: string too big, trailing characters truncated

Problem:  The string in one of the fprintf statements is longer than 2048 characters.

Solution:  Split the string into two.

 

11.  gen_macros.c:  C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

12.  gen_macros.c:  C4101: 'n' : unreferenced local variable

Solution:  Remove this variable declaration in gen_macros().

 

13.  gen_macros.c:  C2371: 'gen_pr_add_macros' : redefinition; different basic types

Solution:  Change "gen_pr_add_macros()" to "void gen_pr_add_macros(void)" to match its prototype.  Do the same to gen_pr_link_macros(), gen_pr_unlink_macros, and gen_pr_free_macros.

 

14.  gen_macros.c:  C4101: 'i' : unreferenced local variable

Solution:  Remove this variable declaration in gen_pr_link_macros().

 

15.  gen_ops.c:  C4101: 'parent_pkey' : unreferenced local variable

Solution:  Remove this variable declaration in gen_add_row().

 

16.  gen_ops.c:  C4101: 'line_len' : unreferenced local variable

Solution:  Remove this variable declaration in gen_dump_table().  Also, remove temp_string, max declarations, and j.

 

17.  gen_path.c:  C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

18.  gen_path.c:  C4101: 'j' : unreferenced local variable

Solution:  Remove this variable declaration from gen_path().

19.  gen_pr_add.c:  C4101: 'unpad' : unreferenced local variable

Solution:  Remove this variable declaration in gen_pr_add().  Also, remove foundnum, tmp, parent_pkey, and i.

 

20.  gen_pr_add.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

21.  gen_pr_add.c:  C4101: 'rr_entry' : unreferenced local variable

Solution:  Remove this variable declaration from gen_pr_link().  Also, remove tmp.

 

22.  gen_pr_add.c:  C2371: 'gen_pr_create' : redefinition; different basic types

Solution:  Fix gen_pr_create() definition to match its prototype.

 

23.  gen_pr_delete.c:  C4013: 'gen_pr_del_bt' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

24.  gen_pr_delete.c:  C4101: 'foundnum' : unreferenced local variable

Solution:  Remove this variable declaration from gen_pr_delete().  Also, remove parent_pkey, and i.

 

25.  gen_pr_delete.c:  C2371: 'gen_pr_del_bt' : redefinition; different basic types

Solution:  Fix gen_pr_del_bt() definition to match is prototype.

 

26.  gen_pr_dump.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

27.  gen_pr_dump.c:  C4101: 'line_len' : unreferenced local variable

Solution:  Remove this variable declaration from gen_pr_dump().  Also, remove temp_string and j.

 

28.  gen_pr_log.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

29.  gen_pr_log.c:  C4101: 'unpad' : unreferenced local variable

Solution:  Remove this variable declaration from gen_log_do_add().

 

30.  gen_pr_log.c:  C4101: 'elseflag' : unreferenced local variable

Solution:  Remove this variable declaration from gen_log_do_set_int(), gen_log_do_set_flt, gen_log_do_set_key(), and gen_log_do_set_str().

 

31.  gen_pr_stats.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

32.  gen_pr_stats.c:  C4035: 'gen_pr_stats' : no return value

Solution:  Set gen_pr_stats() return type to "void".

 

33.  gen_pr_stats.c:  C4101: 'line_len' : unreferenced local variable

Solution:  Remove this variable declaration from gen_pr_stats().

 

34.  lut_c_file.h:  C2026: string too big, trailing characters truncated

Problem:  p_lut_c static string cannot hold more than 2048 characters on WNT platform.

Solution:  Split p_lut_c into seven strings.  Add fprintf statements to print p_lut_c2 through p_lut_c7 in gen_lut.c and gen_pr_utils.c.

 

35.  btree_c_file.h:  C2026: string too big, trailing characters truncated

Problem:  p_btree_c static string cannot hold more than 2048 characters on WNT platform.

Solution:  Split p_btree_c into seven strings.  Add fprintf statements to print p_btree_c2 through p_btree_c7 in gen_lut.c and gen_pr_utils.c.

 

36.  gen_rr_matrix.c:  C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

37.  gen_rr_matrix.c:  C4101: 'obj_ptr' : unreferenced local variable

Solution:  Remove this variable declaration from gen_rr_matrix().  Also, remove k.

 

38.  gen_structures.c:  C4101: 'index' : unreferenced local variable

Solution:  Remove this declaration from gen_structures().

 

39.  make_cptr.c:  C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

40.  make_pptr.c:  C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

41.  open_files.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

42.  parse_schema.c:  C4101: 'obj_ptr' : unreferenced local variable

Solution:  Remove this declaration from parse_schema_chgen() and parse_schema_gendb().

 

43.  parse_schema.c:  C4047: '!=' : 'int ' differs in levels of indirection from 'void *'

Solution:  Change "NULL" in "while (tempstr2[x] != NULL)" to " '\0' " in parse_scheme_gendb().

 

44.  parse_schema.c:  C4013: 'pr_init' undefined; assuming extern returning int

Solution:  Add function prototypes at top of file for pr_init(), pr_load(), and pr_dump().

 

45.  parse_schema.c:  C4101: 'i' : unreferenced local variable

Solution:  Remove this declaration from TT_to_tt().  Also, remove parse_state().

 

46.  pr_delete.c:  C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

47.  pr_delete.c:  C4013: 'pr_del_bt' undefined; assuming extern returning int

Solution:  Add pr_del_bt prototype to top of file.

 

48.  pr_delete.c: C4101: 'result' : unreferenced local variable

Solution:  Remove declaration from pr_del_bt().

 

49.  pr_dump.c:  C4013: 'exit' undefined; assuming extern returning int

Solution:  Include stdlib.h.

 

50.  pr_dump.c:  C4013: 'find_view_idx' undefined; assuming extern returning int

Solution:  Add find_view_idx prototype to top of pr_dump.c.

 

51.  pr_dump.c: C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

52.  pr_dump.c:  C4013: 'meets_view' undefined; assuming extern returning int

Solution:  Add meets_view prototype to top of pr_dump.c.

 

53.  pr_dump.c:  C4101: 'outkey' : unreferenced local variable

Solution:  Remove declaration from pr_dump().

 

54.  pr_dump.c:  C4013: 'hcg_update_version' undefined; assuming extern returning int

Solution:  Add hcg_update_version prototype to top of pr_dump.c.

 

55.  pr_free.c:  C4013: 'free' undefined; assuming extern returning int

Solution:  Include stdlib.h and malloc.h.

 

56.  pr_load.c:  C1083: Cannot open include file: 'strings.h': No such file or directory

Solution:  Include string.h (not strings.h) for WIN32 only.

 

57.  pr_load.c: C4013: 'btree_does_node_exist' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

58.  pr_load.c:  C4101: 'abbr_size' : unreferenced local variable

Solution:  Remove declaration from lut_insert_element().

 

59.  pr_load.c:  C4101: 'idx' : unreferenced local variable

Solution:  Remove declaration from lut_delete_element().

 

60.  pr_load.c: C4101: 'p_ch' : unreferenced local variable

Solution:  Remove declaration from lut_get_name().

 

61.  pr_load.c: C4101: 'ret_stat' : unreferenced local variable

Solution:  Remove declaration from btree_create().

 

62.  pr_load.c: C4101: 'rval' : unreferenced local variable

Solution:  Remove declaration from ver_decode().

 

63.  pr_load.c: C4101: 'vri' : unreferenced local variable

Solution:  Remove declaration from set_version().

 

64.  pr_load.c: C4013: 'close' undefined; assuming extern returning int

Solution:  Since fopen() is called to open a file, fclose should be called to close it.  Replace two calls to "close" with "fclose" in pr_init().

 

65.  pr_load.c:  C4013: 'btree_initialized' undefined; assuming extern returning int

Solution:  Add btree_initialized prototype to pr_load.c.

 

66.  pr_load.c: C4101: 'filevar' : unreferenced local variable

Solution:  Remove declaration from pr_init().

 

67.  pr_load.c: C4013: 'bzero' undefined; assuming extern returning int

Solution:  Replace bzero with memset.  This code was already there, but memset was commented out.

 

68.  pr_load.c: C4101: 'tempkey' : unreferenced local variable

Solution:  Remove declaration from pr_add().

 

69.  pr_load.c: C4013: 'close' undefined; assuming extern returning int

Solution:  Replace close with fclose in pr_load().

 

70.  pr_load.c:  C4101: 'result' : unreferenced local variable

Solution:  Remove declaration from pr_find_bt().

 

71.  pr_stats.c: C4013: 'exit' undefined; assuming extern returning int

Solution:  Include stdlib.h.

 

72.  pr_stats.c: C4013: 'lut_insert_element' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

73.  validate_schema.c: C4013: 'strncpy_null' undefined; assuming extern returning int

Solution:  Include prototypes.h.

 

 

 

Compiler Errors when compiling Generated Database Code

 

As was mentioned earlier, one of the first things we discovered when we started working on gencpp was that the source files that it generates would not compile without substantial alteration.  The following are a list of compiler errors in the generated output files.  Gencpp was updated so that it generates code that compiles without these errors.

 

74.  C2106: ‘=’ : left operand must be l-value

Line of code in XXops.cpp.

link_child_nobp_m(...)

Actual problem is in link_child_nobp_m macro in SISxxx.h file.  “(pa *) is not needed on WIN32.(pa *) ca##curr2->p##_fpp = pa##curr;

Solution: Fix gen_macros.c lines 396, 404, 430 to remove leading cast for WIN32 only.

Note: p##_fpp is type (void *).

Note2: This is also a problem on UNIX that we have not fixed, because this code is not compiled when using the STL version of the database code.  Yet another good reason to trust the libraries shipped by your compiler vendor when they are available.

 

75.  C2664: ‘decode_retstr’ : cannot convert parameter 1 from ‘unsigned long’ to ‘unsigned long *’

Line of code in XXops.cpp.

decode_retstr(value)

Actual problem is decode_retstr expects hcg_key* but “value” is type hcg_key.

Solution: Fix gen_ops.c line 119 to change “value” to “&value”.

 

76.  C2106: ‘=’ : left operand must be l-value

Line of code in XXops.cpp.

link_parent_nobp_m(...)

Actual problem is in link_parent_nobp_m macro in SISxxx.h file.  “(pa *) is not needed on WIN32.

(pa *) ca##curr->p#3_fpp = pa##curr2;

Solution:  Fix  gen_macros.c to remove leading cast for WIN32 only.  Search for link_parent_nobp_m to find the spot(s).

 

77.  C2106: ‘=’ : left operand must be l-value

Line of code in XXops.cpp.

unlink_parent_nobp_m(...)

Actual problem is in unlink_parent_nobp_m macro in SISxxx.h file.

(pa *) ca##curr2->p##_fpp = pa##temp;

Solution: Fix gen_macros.c to remove leading cast for WIN32 only.  Search for unlink_parent_nobp_m.

 

78.  C2447: missing function header (old_style format list?)

Line of code in XXops.cpp.

void FA::set_start_date (char *value)

;}

Actual problem is the extra semicolon.

Solution: Remove extra semicolon within switch/case ‘d’ statement in gen_set_ops() in gen_ops.c.

fprintf(XXops_fp[i],”void %s::set_%s (char *value)\n; ...

 

79.  Extra ‘;’s in XXschema.h files

Solution: Remove extra semicolon within switch/case ‘d’ statement in gen_structures.c.

fprintf(XXschh_fp[j],”   void set_%s (char *value);\n;...

 

80.  C2660: ‘strcpy’ : function does not take 3 parameters

Line of code in XXops.cpp.

strcpy(start_date, value, 26) ;

Actual problem: strcpy should be strncpy.

Solution: Make fix within switch/case ‘d’ statement in gen_set_ops in gen_ops.c.

 

81.  C2065: ‘exit’ : undeclared identifier

Line of code in pr_dump.cpp.

exit(1);

Actual problem is pr_dump.cpp needs to include stdlib.h on NT.

Solution: Make the change in gen_pr_dump() in gen_pr_dump.c.

 

82.  C2065: ‘free’ : undeclared identifier

Line of code in pr_free.cpp.

free_table(...)

Actual problem is pr_free.cpp needs to include both stdlib.h and malloc.h on WIN32.

Solution: Make the changes in gen_pr_free() in gen_pr_free.c.

 

83.  C2065: ‘strlen’ : undeclared identifier

Line of code in pr_load.cpp.

for (i=strlen(s)-1; (i>=0) && ...

Actual problem is pr_load.cpp needs to include string.h and not strings.h on WIN32.

Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.

 

84.  C2065: ‘atoi’ : undeclared identifier

Line of code in pr_load.cpp.

test = atoi(name);

Actual problem is pr_load.cpp needs to include stdlib.h.

Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.

 

85.  C2202: ‘key_compare’ : not all control paths return a value

Line of code in pr_load.cpp.

key_compare(...)

This error may arise only due to the compiler warning setting on NT.  However, the code fix is to change the line:

if ( (unsigned long)k1 > (unsigned long)k2 ) return 1;

to the following

return 1;

Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.

 

86.  C2202: ‘key_compare1’ : not all control paths return a value

See previous error.

87.  C2664: ‘clear_mem’ : cannot convert parameter 1 from ‘struct ts_list_type[7] to ‘char *’

Line of code in pr_load.cpp.

clear_mem(hcg_ts_list,sizeof(hcg_ts_list));

Solution: Cast hcg_ts_list to (char *) in gen_pr_init() in gen_pr_init.c.

 

88.  C2056: ‘close’ : undeclared identifier

Line of code in pr_load.cpp.

close(hcg_ascii_fp);

Actual problem is fclose() should be called because fopen() was used.

Solution: Make the changes (error occurs twice) to gen_pr_init() in gen_pr_init.c.

 

89.  C2065: ‘close’ : undeclared identifier

Line of code in pr_load.cpp.

close(hcg_ascii_fp);

Solution: Make the change to gen_pr_load() in gen_pr_load.c.

 

90.  C2065: ‘milliseconds’ : undeclared identifier

Line of code in pr_log.cpp.

log_sleep(milliseconds)

Actual problem is this is an old style declaration.  Need to define __STDC__ when compiling on NT.

 

91.  C2065: ‘usleep’ : undeclared identifier

Line of code in pr_log.cpp.

status = usleep(&t);

Actual problem: missing prototype.

Solution: Make fix to gen_pr_log.c.

 

92.  C2079: ‘newact’ uses undefined struct ‘sigaction’

Line of code in pr_log.cpp.

struct sigaction newact, oldact;

Solution: recode for NT signal handling.   This is NOT done.

 

93.  C2065: 'exit' : undeclared identifier

Line of code in pr_stats.cpp.

exit(1);

Actual problem is pr_stats.cpp needs to include stdlib.h on NT.

Solution: Make the change in gen_pr_stats() in gen_pr_stats.c for WIN32.  If this is needed for unix, then remove the win32 ifdef.

 

94.  C2282: 'class' is followed by 'SE' (missing ','?)

Line of code in SISxxx.h.

extern class SE *SEbegin, ... ;

Solution: Rename SISxxx.c to SISxxx.cpp.

 

Runtime Errors

 

            Naturally, once we actually got gencpp to generate files that would compile and link, we couldn’t expect it to work properly right out of the gate, either.  So, we had some runtime errors to fix as well.  Here was the most graphic one.

 

95.  Access Violation

XXid_fcp fields are not initialized prior to use.  On UNIX in debug mode, new() initializes to 0, but not on NT, nor in optimized mode on UNIX.

Files were built by running "gencpp -ansi -metafile".

Occurs in XXops.cpp, XX::link_row() when macro link_parent_bp_m is used.

An Access Violation occurred with the following call stack:  SC::link_row()    pr_link()         load_data()        pr_load()        main()

 

The actual line of code in link_row() is the macro link_parent_bp_m(ca,pa,cf,pk,p,c).  The violation occurs as marked below.  On NT, pa##curr2->c##_fcp (in the if statement) is non-null (garbage) after the new operator allocates pa##curr2.  But on unix, it points to NULL.  Regardless, it needs to be initialized before it is used.  Had it been initialized, then execution would have proceeded into the "if" block instead of the "else" block.

 

if (pa##curr2->c##_fcp == NULL) \

{ \

        pa##curr2->c##_fcp = ca##curr; \

        ca##curr->p##_bpp = pa##curr2; \

} \

else \

{ \

        ca##curr->p##_bpp = ca##temp; \

        ca##temp->p##_fpp = ca##curr; \  <--violation

} \

 

This bug does not show up on UNIX in debug mode because the fields in pa##curr2 happen to point to NULL when pa##curr2 is allocated.  In non-debug mode, the same behavior is observed.  In other words, any application using "new" should initialize its allocated memory before using it.

 

Solution:  XX::create_rows() must initialize fields to NULL.  Can probably use a func similar to make_cptr when writing out create_rows() for each table.

 

 


 

Appendix B - Source and documentation locations

 

Listed below is an index of the locations of all source code and documentation relevant to this project.  All paths listed are relative to $CASE/2kf522/kmiu/

           

gencpp_final_2000Dec/doc/phase1_example.doc contains an example Information Model for phase1 of the project

gencpp_final_2000Dec/doc/phase1_projectV2.doc is a report on GENCPP Enhancement to Use Container Classes for phase 1

gencpp_final_2000Dec/doc/phase2_specification.doc is the the functional and implementation specifications written for phase2

gencpp_final_2000Dec/doc/phase2_state_diagrams.doc contains the state diagrams for the two new classes, table and row, used in our new version of gencpp

gencpp_final_2000Dec/doc/FinalReport.doc is the final report that ties together all three phases of Object Oriented Analysis and Design. It discusses the pros and cons of both intrusive and non-intrusive data structures and explains our rationale for using non-intrusive data structures.

gencpp_final_2000Dec/src contains all the source code for our new version of gencpp

gencpp_final_2000Dec/test contains source code for the test we used to test gencpp. This dir also contains the *.sch, *.dat, and *.vdf files used for testing.

gencpp_2000Nov30 contains source code for bug fixes and NT port of ntansala's version of gencpp. This source code does NOT contain our project enhancements other than the bug fixes and NT port.

 

 

 

 

 

 

 

 

 

 

 

 

 

Appendix C - chgen to gencpp API Refactoring Hints

 

[RJLRef:  $PH/COOL_GEN/gencpp/2kf522_gencpp_nt_rl.txt]

 

This note specifies refactorings that C++ semantics required   from chgen pr_set/get macros to gencpp set/get functions, and from pr_add/delete to their gencpp alternatives. Now chgen13 has pr_accessors.c that closely resembles gencpp's get/set_field functions.

 

Naiyana T says gencpp does NOT support chgen pr_set_type macros. She also advises the gencpp runtime option -D USE_STL.  TBD: Verify that Naiyana T's advice to use option -D USE_STL is not required to have access to gencpp's O-O access  methods e.g. set_<<field name>>(). C++ is generated  with or without -D USE_STL, so set/get_<fieldname> methods  should also be generated in eithere case.

 

I am more interested in running gencpp without USE_STL.  Chgen13's pr_accessors.c is compatible with gencpp's  get/set_field. I want to migrate bde (and other apps)  to pr_accessors.c first, and then refactor source code  to call gencpp's get/set_field by simply editing functon calls.

 

Source code can be incrementally migrated from pr_set_<type>  to pr_accessors.c style and then to set_fieldname(). All three forms are intended to have the same return value  and side effects. (Gencpp may not support logging and replay.)

 

-------------------------------------------------

 

From ntansala@cs.uml.edu  Mon Sep 17 19:11:28 2001

Date: Mon, 17 Sep 2001 19:10:14 -0400 (EDT)

From: Naiyana Tansalarak <ntansala@cs.uml.edu>

To: 01f522

cc: Bob Lechner <lechner@cs.uml.edu>, Naiyana Tansalarak <ntansala@cs.uml.edu>

Subject: GENCPP

RJLRef: ~/public_html/2kf522/gencpp_nt.txt

            [--> $PH/COOL_GEN/gencpp/2kf522_gencpp_nt_rl.txt, RJL050911 ]

 

[Update 051027 to Rev. 021004 - RJL:

This note lists the chgen macro calls that must be changed  to gencpp class method calls in order to work with gencpp.

 

It also tells you how to build and run the gencpp executable, ASSUMING the pr_*.c files in gencpp/src have already been  generated from gencpp/src files.

 

However, it does not tell you how to generate gencpp/src/pr*.c, or how to select the new STL container option by -D USE_STL. 

 

The gencpp/src files include pr*.c files  that were produced by chgen from metadata source metaschema.sch.  The 2kf522 project used genv11 to create these pr*.c files; however, 'genv13 -ansi metaschema.sch'  should now be used because it has better logging and replay facilities, default options -ansi and -nobp,  64-bit alpha pointer support,  and it works with bde.

 

TBD: Identify and import metaschema.sch (tables SV, TT, TA, TS)

End RJL 021004 revision.

-----------------------

To use GENCPP, you may build your own GENCPP by copying source files from

$CASE/2ks522/kmiu/gencpp_final_2000Dec to your local directory and then run "make" to build GENCPP (before that you need to create subdirectory "bin" at 1 upper level of source file directory to locate GENCPP).

 

When compiling the generated files with your main program, don't forget to  add flag "-DUSE_STL" so that the system will recognize Standard Template  Library (STL).  You can refer to CHGEN Umanual ver8 except some macros have  been change as noted below and please refer this gencppFinalReport_km_mo_rl061210.*  for more details .

 

Note :  These macros no longer work, call class methods instead as follows:

====

No.  chgen Old Macros                         gencpp New Methods

==    ===================            =================

1.  pr_set_int(tbl,fld,value) set_<<field name with int type>> (value)

2.  pr_set_flt(tbl,fld,value) set_<<field name with float type>> (value)

3.  pr_set_key(tbl,fld,value) set_pkid(value) for primary key

                              set_XXid (value) for foreign key

4.  pr_set_str(tbl,fld,value) set_<<field name with string type>> (value)

5.  pr_get_int(tbl,fld)         get_<<field name with int type>> ()

6.  pr_get_flt(tbl,fld)         get_<<field name with float type>> ()

7.  pr_get_str(tbl,fld)       get_<<field name with string type>> ()

8.  pr_get_key(tbl,fld)       get_pkid() for primary key

                              get_XXid() for foreign key

9.  pr_add(view,tbl_abbr,tbl_ptr)   add_row(view)

10. pr_delete(tbl)                  delete_row()

11. last_child(AA,BB,AAid)                last_child(AA,BB,BBid)

 

 

Row

1. Creating row

 
 R1: Create row()

 

                     R4: Add row(row ID, viewname)

Text Box: 99. Deleting row

4. Adding row

 
                     

 

 


                 Initialize row.pkid = 0            Set row.pkid   Unlink any parents

                 If row data is from                validate row   Unlink any children

                   *.dat file then                    attributes        

                   Generate R2:                     Generate R3:        

                   Parse row(row ID,buffer,         Link row(row ID)    

                     idx, hcg_k)                                        

                 else

                  Generate R4:

                     Add row(row ID, viewname)

                                  

 

                                                    R3: Link row (row ID)

 

 

                R2: Parse row(row ID,

                 buffer, idx, hcg_k)

2. Parsing row

 
 


3. Linking row

 
                           R3: Link row(row ID)

 


             Set row.pkid

             parse buffer and                      Generate T3:Insert row(table ID,row ID)

               update attribute fields             link parents as needed

             Generate R3:  Link row(row ID)        link children as needed

                                                   Generate R5: Row OK(row ID

                                                  

 

 

5. Row OK

 
                          

                                                   R5: Row OK(row ID)

                                 

                                                               R99: Delete row(row ID)

 


                 R6: Dump row(row ID, viewname,

                 file_name, new_version,                      R7: Dump row (row ID,

                 modestr)                                     viewname, new_version,

                                                                           modestr)

   

6. Printing to file

 
                                                    R5: Row OK(row ID)

7. Printing to terminal

 
                             R5: Row OK(row ID)    

 


            

             Print out all attributes                 Print out all attributes                                  

              to file_name                              to terminal

             Generate R5: Row OK(row ID)              Generate R5: Row OK(row ID)

                                                          

     

             Transitions labeled R# are EventTypes addressed to a Row-class object.

             Transitions labeled T# are EventTypes addressed to a Table-class object.

Bibliography

 

Soukup Jiri: Intrusive Data Structures, article published in the C++ Report, in three parts between May and October 1998. (This article describes how to design a template library for intrusive data structures, and compares this library with traditional, container based class libraries). See also his book: Taming C++: Pattern Clases and Persistence for Large Projects,  A-W 1994.

 

Meyers, Scott: Effective C++, 2nd Edition, Addison-Wesley 1998, ISBN 0-201-92488-9

 

Page-Jones, Meilir: Fundamentals of Object-Oriented Design in UML, Addison-Wesley 1998, ISBN 0-201-69946-X

 

Lee, Richard and Tepfenhart, William: UML and C++ - A Practical Guild to Object-Oriented Development, Prentice Hall 1997, ISBN 0-13-619719-1

 

Eckel, Bruce: Thinking in C++ 2nd Edition, volumes I and II, Prentice Hall 2000, ISBN 0-13-979809-9

 

Lippman, Stan and Lajoie, Josee:  C++ Primer 3rd Edition, Addison-Wesley 1998, ISBN 0-201-82470-1

 

Coplein, James: Advanced C++: Programming Styles and Idioms, Addison-Wesley 1992, ISBN 0-201-54855-0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



[1] Current location of this file: gencppFinalReport_km_mo_rl061210.doc .  (In this 061210 revision I added several new footnotes, Appendix C,  and an ADDENDUM to the last section on Future Work - RJ061210.)

[2] The 'public' meta-attribute may not be necessary. That depends on API information-hiding vs. performance tradeoffs that may not be relevant to a prototype. Future work needs to be done here. One refactoring step in this direction is the file pr_accessors.c from chgen13. This contains alternate field  access methods that are more like the usual ones than those called pr_set_typ() and pr_get_type() from chgen up thru  ver_12. (See ADDENDUM to Future Work.)

[3] This report by N Tansalarak on certain unavoidable changes to parent-child-set navigation API code:

$PH/COOL-GEN/gencpp/2kf522_gencpp_nt_rl.txt has been added here as Appendix C. - RJL061210.

 

[5] Vectors provides direct access iff the access keys are equal to or mappable to the actual Vector indices.

Vectors can provide n*log n access by binary search over Vector indices if pkeys occur in sorted order..

See ADDENDUM to Future Work

 

[6] A more effective way to dump to a file would be depth-first as objects are nested within some spanning tree of container objects, using the dump_row method of each row-class. Since the schema is not a tree, a (non-unique) spanning tree must first be specified.  A dump_database_tree method would traverse child_sets and dump their components depth-first, recursing into  composite compoments.) This dump could easily be transformed into a hierarchical XML format if desired by using the metadata in tables TT and TA.   An example of metadata use is given in $PH/06f522/asgnt3 for tables CK, CA and CQ (CandidateKeys, CandidateKeyToTableAttribute associations, and CandidateKeyQueries).

[7] Our current object creation discipline allocates pkeys in ascending order and never reuses deleted ones.  The row.insert method should begin at the row.end iterator and walk the table backward. This will usually avoid the search, but remains safe if future pkeys arrive out-of-order. Binary search of Vectors is also possible assuming pkeys occur in sorted order.

[8] For insertion into parent-child sets,  parent-key order is random. However, checking the last pkey used in the parent table is an effective form of cache and should be done before starting the search loop. Of course, a Vector indexed by [a range of] pkey row numbers would avoid the search altogether. See ADDENDUM to Future Work.

[9] Note that re-parenting a child object, by updating its fkey field, is equivalent to removing this object from its current parent's child set  and inserting it into the new parent's child set, without altering any of its other properties.  Both unlink and link calls are required.  See ADDENDUM to Future Work,

[10] This should work for a container class like table;  it would not work for an object (row) class because e.g. 'getters' and 'setters' and load and dump methods are field-name rather than field-type specific.