|
91.522 GENCPP Team—Mark Ottesen and Katherine Miu Enhancements to GENCPP
Replacing Intrusive Data Structures with Non-Intrusive Data Structures December 16, 2000 |
Table of Contents
Background
*Goals
*Compilability
*Portability
*Encapsulation
*Information Hiding
*Compatibility
*Intrusive vs. Non-intrusive Data Structures
*STL Lists vs. Vectors
*Process
*Bug fixes – Port to Win32/Visual C++
*Refactoring database output files
*Container operations
*Changes to macros and the API
*Testing
*Future Work
*Appendix A - Porting Changes and Bug Fixes to gencpp
*Appendix B - Source and documentation locations
*Bibliography
*
Gencpp stands for C++ generator. It is an application which came into being as an extension of an older application, chgen. The purpose of gencpp is to generate C++ code to provide a memory-resident network database based on a relational schema provided by application developers and to support a C++ language API to that database. The remainder of this document discusses modifications and enhancements to the code generated by gencpp unless otherwise noted. All changes relating to the generated code will be worked back into gencpp so that gencpp will create the modified files. To further clarify the difference between gencpp and the code it generates, this report will refer to the generated code as "database code" or "generated database code".
The first goal of this project has been to change gencpp in such a way that the code it generates will compile and link without alteration. Before we began changing the output generated by the current version (ntansala’s version) to support our architectural changes, we needed to debug and fix the current version so that it met this goal. The most obvious difference between the code generated by chgen and the code generated by gencpp is that chgen generates C code, and gencpp generates C++ code. Since the C language is entirely a subset of C++, porting an application from C to C++ can be as simple as using a C++ compiler instead of a C compiler. However, C++ requires stricter type checking than C does. For instance, in C++ a function must have a return value type declared, even if it is void. C does not have this requirement. In addition, there are rules in ANSI C++ about how void* can be used that are stricter than those in C. A list of the changes and bug fixes required to the original gencpp is listed in Appendix A.
The second goal of this project has been to have gencpp compile with gcc versions 2.8.1 and 2.9.5, as well as Visual C++ versions 5 and 6. In addition, this project has also made the generated database code compile with g++ versions 2.8.1 and 2.9.5, as well as Visual C++ versions 5 and 6. In theory, C and C++ are portable languages such that if the programmer follows ANSI standards, code that will compile on one platform will also compile on any other platform with any other compiler that claims to be ANSI-standard compliant. Anyone who has written a program longer than twenty lines and has had to compile it on different hardware platforms, or even the same platform and operating system with a different compiler, knows that reality is quite different. In order for the generated code to be useful in applications written for the Windows environment, such as the Visual C++ port of BDE, it has to compile with Visual C++. Again, a list of the changes and bug fixes required to the original gencpp is listed in Appendix A.
The third goal of this project is to model a relational database table and database row by defining a table class and row class, respectively. As previously stated, the C language is entirely a subset of C++. That being said, there are features available in C++ that were not available in C. Because of its roots in C, C++ isn’t considered to be a pure object oriented language like Smalltalk or even Java. However, as a hybrid object-oriented language, C++ has many facilities to support the object-oriented paradigm. The current version of gencpp is a first step in the evolution of chgen into the realm of C++ language support and the additional facilities it contains by creating C++ classes whose instances are rows of a database table. At this time, these rows are the only objects in the system represented by classes. This project enhances the database code such that tables are represented by classes as well, and the one to many relationship between these two objects is properly defined and attributed.
Page-Jones (p.9) defines encapsulation as "the grouping of related ideas into one unit, which can thereafter be referred to by a single name". That being said, there are operations in the current row class that are more closely related to the table object than the row object. These operations have been moved from the row class into the table class. In addition, certain responsibilities pertaining to the relationship between rows and tables have been moved from one type of row object to another. These will be discussed in greater depth.
The fourth goal of this project is to hide the implementation details of the generated database code, with the idea that the underlying implementation can be replaced at a later time, if desired, with a minimal impact on client code. The current version of gencpp creates code that starts down the path toward using the most common idioms of object-oriented programming by combining attributes common to every row object into a class and hiding the implementation of these specific attributes behind accessor and modifier methods. However, the internal member variables of the class that are part of the implementation of the class are publicly visible. These include the next_ptr and prev_ptr member variables in the row class that are used to implement the table, as well as the <child>_fcp, <child>_bcp members of the parent row class and the <parent>_fpp, <parent>_bpp, and <parent>_pp members of the child row class. Hiding these implementation details by making them private allows the implementation of the parent-child relationships to be transparent to the client application. This hiding of the implementation is an important aspect of object-oriented design.
The fifth goal of this project was to make our changes in such a way that the database code generated under the new method would be completely compatible with applications written to use the old gencpp API. This would demonstrate that a complete redesign of the implementation of tables and parent-child relationships within the classes in the database code generated by gencpp could be done in such a way that it is completely transparent to client applications. While the data types of certain global variables in the database code have changed types, client applications can be programmed so as to make their code entirely compatible with both the new and old versions of gencpp in a transparent fashion.
Intrusive vs. Non-intrusive Data Structures
Once the implementation is hidden, we then have the option of changing it. The database code uses what Soukup would call intrusive data structures, structures that contain pointers within the application objects, to maintain database tables and rows. In fact, a table is nothing more than a linked list of rows. One thing that the C++ language provides above and beyond what the C language provides is the Standard Template Library, contributed to the C++ standard by Alexander Stepanov and Meng Lee at Hewlett Packard. The Standard Template Library provides a set of containers for various purposes, including the list and the vector. These containers are non-intrusive, in the sense that the internal pointers that connect elements of the container are not embedded within the object itself. The feasibility of using these non-intrusive containers is discussed below.
There are advantages and disadvantages to both approaches. The current intrusive implementation has the advantage of having less overhead in terms of the individual size of the object. However, as Soukup himself says (C++ Report, 1998), "Whether the list uses one or two pointers is usually insignificant even in applications where memory space is at a premium, because the objects have other data members, and the addition of one pointer represents only a relatively small increase in object size. Traversing non-intrusive lists requires an additional pointer jump, which again is insignificant compared to other processing the application would do while traversing a list". All other things being equal, smaller and faster is always better than bigger and slower, especially in high volume transactions. Soukup goes on to say that in a particular high-volume model pertaining to large telephone switches, he replaced "non-intrusive data structures with intrusive ones, and the speed almost doubled". The reason for this was because "In applications which frequently create and destroy objects, the main overhead comes from the allocation, initialization, and destruction of objects, not from adding and deleting items from the list. Non-intrusive lists include twice as many objects, or require periodic reallocation and copying of the array". While this is true, it doesn’t necessarily fit the needs of the applications that use our code today. The largest current application using the generated database code, the Juice Plant Simulation, contains only 1500 lines of data in all of its tables combined. Furthermore, the range of our current key structure is a limit on scaling up the size of application (E-mail from R. Lechner to 91.522 class on 11/16/2000). As far as the overhead of periodic reallocation and copying of the array, the list container from the STL does not have this problem, only the vector does. Even at that, Prof. Lechner wrote to the current project team in a separate e-mail dated 12/03/2000, "I don't believe copying the entire Vector during resizing is a significant performance hit. Memcpy of the entire array is fast, probably a lot faster than the <pr_create/set_field+/pr_add> incremental construction time (even ignoring parsing) and the cumulative number of copy operations is just the new Vector size (or twice the old size).
Another advantage cited for intrusive data structures over non-intrusive ones is that random insert or remove operations from a doubly linked list are almost instant, but a linear search is necessary for the non-intrusive list."(Soukup). This does not apply to our case. Rows are inserted into database tables in sorted primary key order, so a linear search is required regardless of whether the data structure is intrusive or non-intrusive. In addition, removal from tables and parent-child lists does not happen at random. Only particular values are removed, requiring a linear search to find them whether or not the containers are intrusive or non-intrusive. When talking about insertion, the largest number of insertions happen for our memory-resident database when the application is initialized and loaded from disk. The current implementation of "pr_load pr_links TOO MUCH, eagerly" (E-mail from R. Lechner to 91.522 class on 11/16/2000). This is not necessarily a statement about the general debate between intrusive and non-intrusive data structures, but it does speak to the particular application being studied.
In gencpp generated code, the developers of gencpp over the years have been responsible for the maintenance and evolution of the intrusive data structures. On the other hand, the compiler manufacturers are responsible for providing the implementation of the STL. This gives the STL a distinct advantage that has nothing to do with performance or even the debate between intrusive and non-intrusive data structures. By virtue of being a client user of a library, we don’t have to be responsible for the direct manipulation of the pointers in the implementation of our table and our parent-child relationship – the library is responsible for this. Regardless of the implementation, we can use a full library of code developed and fully tested by professionals before we write one line of code in a single-semester student project. Given the time restrictions usually imposed upon student projects, the impact of this can not be overstated. The implementation of the insert_element macro in the current generated database code takes up forty-three lines of pointer manipulation code that is difficult to read. Our STL container implementation of the table object cuts this down to one very clear line of code. In addition, a macro can’t be debugged in a source-level debugger such as gdb because it is preprocessed code. Neither can it be read directly in the case of compiler problems, again because it is not directly in the source code. Inline template functions can serve the same purpose in C++ as macros in C, but that doesn’t help with the forty-two unnecessary lines of source code to generate and debug.
The STL container classes provide another form of abstraction from the internal implementation of the data structure by having a common interface to all container types through common methods and iterators that behave in a syntactically identical fashion. For example, we can write an implementation today using lists for compatibility reasons. On the other hand, should we change our minds and choose to use vectors in a year, this change can be made to the entire set of database code simply by replacing the type of variable defined for the container and each iterator. This is because the operations on the container remain the same regardless of type. This particular experiment has been done in this project. In addition, the C++ standard requires certain performance guarantees from implementers of the STL. For instance, a vector is required to provide constant-time access to it’s elements, otherwise it isn’t a valid vector implementation. (Meyers, p. 229) .
In addition, using the STL provides access to a whole set of algorithms dedicated to the manipulation of the containers it provides. Tom Janzen enumerates some of these in his project report from the Fall semester of 1997. These algorithms provide a whole range of operations for use with container. The algorithms, once again, have performance standards imposed on them by the C++ standard. For example, the stable_sort must perform in O(N log N) time in order to comply with the standard (Meyers, p. 230).
From a philosophical point of view, another question arises. In the current intrusive implementation, the member variables in the class and the objects they reference really don’t reflect the real world relationships between the objects in the information model. For instance, the fkey_fcp and fkey_bcp pointers in a child row object represent a "next sibling" and "previous sibling" relationship between child row elements. Is this really the relationship being abstracted by the objects? The purpose of these variables is to reflect a relationship between the child row objects and their parent rows. The relationship between one child row and the next according to "next sibling" and "previous sibling" is purely coincidental and a consequence of the one to many relationship from the parent to the child relationship. Normally this would be implemented by having only a referential attribute in the CO object pointing at the single parent SD object. In this case, that is reflected by the SDid_pp attribute. However, Teptenhart says by way of Rumbaugh that "if fast traversal is critical, then it would be better to add attributes to both sides of the association. In this case, the attribute added to the class on the ‘one’ side is not a buried pointer, but a set of pointers to a class object on the other side" (UML and C++ [Prentice Hall, 1997] p.259). In this project, a C++ standard library container class has been used in the parent row to implement this relationship, eliminating the need for previous and next pointers in the child row class. The direct parent pointer has been maintained.
One should bear in mind that in practical terms, a certain level of intrusiveness is necessary. For instance, in the case of a parent-child relationship, in order to provide bi-directional navigability between parent and child, the child must have a pointer to the parent. "Therefore the organization is intrusive at its very heart". (Soukup)
In addition, the table as an object does not exist in the current intrusive. Like the relationship between sibling rows of a common parent, the relationship between a set of rows in the same table is really just a consequence of being multiple objects of the same type which can collectively be called a table. Currently, the row class is responsible for maintaining the abstraction of the table by maintaining links to the next and previous rows with the next_ptr and prev_ptr attributes. Like the parent-child relationship, these attributes don’t abstract a real world property of the row. By implementing the table as a separate object, the row classes in turn are a more direct reflection of the objects they are designed to abstract, the one to many relationship between the table and the rows of data it contains.
What this project has done is implement the tables and parent-child relationships as STL list objects. In the case of the table, it is a globally defined object containing a private list of pointers to rows of the table. In the case of parent-child relationships, the model suggested by Lee and Tepfenhart cited earlier is followed. For each foreign key in the child row (the "many" side of the relationship) that points back to the parent, there is a direct pointer back to the parent row (the "one" side of the relationship) for that particular field. The parent row contains a list of pointers for each foreign key from a child that points back to it. A simple example from a student database representing the prerequisite relationship between courses is shown below.
In this example, suppose that in the associative object prerequisite, COid1 is an advanced course that has a course COid2 that is required as a prerequisite. In this case, a given course object will contain two lists. The PRid1_children list member will contain pointers to each course that has the given course as a prerequisite, and the PRid2_children list will contain pointers to each course that it is a prerequisite for.
Another advantage if having the container of child lists in the parent object is that the data structure is homogenous in content. The current implementation, being a circular linked list which contains the parent object in addition to all the child objects, is not. This causes a problem because the _fpp and _bpp pointers in the child object become type (void*), leading to compilation problems with C++ compilers when the (void*) member is cast or dereferenced. Since C++ is stricter in its type-checking than C, this is an example of something that used to work with a C compiler that no longer is legal in C++.
STL list containers are used for these objects. We considered the STL vector container, but chose lists because of differences in the implementation between the list and the vector, and compatibility with existing applications. The analogy between STL lists and vectors is quite similar to the analogy between an array and a dynamically allocated linked list in standard C programming. A vector represents a contiguous area of memory in which each element is stored in turn. A list represents noncontiguous memory doubly linked through a pair of pointers to the front and back. The vector supports very efficient random access because each access is a fixed offset from the beginning of the vector. The list, on the other hand, does not support direct random access to its elements, so retrieval of elements by direct address can be slower through the list class. It takes the same amount of time to insert at the beginning or end of either a vector or a list. However, if the insertion needs to happen in other locations, the list is the clear winner over the vector. This is because in the vector, each element after the inserted element must be recopied and moved, just as if you were inserting a new element into an array where a new slot had to be created for it in the middle. This presents a much bigger problem for us than just performance. Suppose we have retrieved a pointer or an STL iterator to an element in the vector. We then insert a new element at some point in the vector before the element we have stored a reference to. The insertion will cause our stored element to a new location in memory, causing our pointer or iterator to no longer be valid. This is extremely problematic for us because table elements are stored in sorted order, meaning that we are likely to insert elements in locations throughout the table at any point in time. With a list, our pointers and iterators will always remain valid, so we don't have to worry about this. Each time a new element is inserted into a list, the front and back pointer of the two elements between which the new object is being inserted are reassigned to point to the new object, and the front and back pointers of the new object in turn are initialized to address these two elements. This problem manifests itself in a much larger scale when the vector resizes itself. When a vector reaches its capacity, it resizes itself by allocating double its current capacity, copying its current values into the new allocated memory, and freeing the previous memory. However, the point at which the memory is reallocated is implementation dependent. The version of the STL that ships with gcc starts with an initial size of 1, and then doubles its size when it reaches its capacity. The RogueWave implementation starts with a size of 256 for integers, 128 for doubles, and smaller sizes for larger objects, depending on their complexity. A vector can be resized using its resize() method, or an initial size can be set using the reserve() method, allowing us to keep track of pointers and iterators we know about and updating them when they need to be updated, but we can’t keep track of iterators that the user declares for their own use. As far as performance goes, even with many random insertions, a vector will generally be faster than a list with small objects, but for a complex object (meaning one with a copy constructor and/or an overloaded assignment operator), a list is faster, as shown in the tables below.
Time in Seconds to insert 10,000,000 elements
|
Data Type |
List |
Vector |
|
Int |
10.38 |
3.76 |
|
Double |
10.72 |
3.95 |
|
Simple Class |
12.31 |
5.89 |
|
String |
14.42 |
11.80 |
Time in Seconds to insert 10,000 elements
|
Data Type |
List |
Vector |
|
Large Simple Class |
0.36 |
2.23 |
|
Large Complex Class |
2.37 |
6.70 |
This chart is from (Lippman, p.258). This shows that there are circumstances where a list is clearly preferable in performance to a vector.
Bug fixes – Port to Win32/Visual C++
The first step in this project was to get the original gencpp application to compile and run without warnings, as well as to have it generate code that would compile and run without warnings. In order to be able to work in the graphical environment of Visual C++, this project produced changes such that gencpp would compile and run with this tool.
Building gencpp on NT
Here are the steps that need to be taken to build gencpp in the Visual C++ environment. This was done with Visual C++ version 5.0.
A list of changes to gencpp necessary to generate database code that compiles and
links with both Visual C++ and g++ is provided in Appendix A. These are not limited to porting changes. When this project was started, the generated code would not compile under g++. The code generated now compiles, although under Visual C++ there are still some warnings about comparisons between signed and unsigned integers.
Building Database Applications
Visual C++ does not recognize the ".cc" extension as a valid file extension for C++ source files. The convention in Visual C++ is to use the extension ".cpp". Gencpp has been updated to generate .cpp files if the symbol "WIN32" is defined by the compiler. This symbol is defined on Windows 32 bit platforms under the Visual C++ compiler. It is not necessarily defined using other compilers even on Windows 32 bit platforms. For example, one of the authors tested this project on Windows 95 using a port of gcc and g++ to the Win32 platforms. This was for the purpose of verifying that changes made during the port to Visual C++ left the code in a state such that it would still build and run in a UNIX environment using gcc. With gcc on Windows 32 bit platforms, "WIN32" is not defined, so the ".cc" file extension is still used as it is on UNIX platforms.
In order to leave the generated database code in such a state that it could be tested under both the new and the old methods of storing tables and parent-child relationships, the database code is generated so that if the "USE_STL" flag is defined, the new implementation of tables and parent-child relationships using STL lists is used. If this flag is not defined, client applications will use the older intrusive internal representation of tables and parent-child relationships. If the "USE_STL" flag is intended to be defined when compiling the generated code, the "-nobp" command line option to gencpp is irrelevant, because the forward and backward pointers to parents and children within the row objects are now hidden within the STL list object members of the row classes.
The steps necessary to set up a workspace and project in Visual C++ to build a project and workspace for a database client application depends on the nature of that application. To set up a Win32 Console Application, the same steps can be used for this workspace as were used for gencpp itself. If you are using the database code with an MFC application or within a dynamic link library, the steps will need to be adjusted accordingly. As with the previous version of gencpp, "#define MAIN" must be at the top of the file that contains main as its entry point.
Under Visual C++, there are still a few warnings about unreferenced local variables in the add_row and link_row class methods of the row class when the USE_STL macro is defined. This is because these variables are used in these functions when the macro is not defined, and as such needed to be left in the generated code. Fixes to gencpp in order to have the generated code be compilable under g++ and Visual C++ are included in Appendix A.
Refactoring database output files
The evolutionary path of gencpp from chgen has been one of taking incremental steps from an application generating code in a language that is not object oriented by nature towards generating code in a language that is more object oriented by nature. The first step was to have the code generate classes for some objects in the system, namely the rows of the table. This was a big step, and a necessary first step to any other work. The C code generated by chgen used a series of macros as its API as a means of hiding the implementation details of table objects and relationships, as well as to provide a typeless means of reusing code internally.
The refactoring of generated output files starts with the row classes. Please see the specification document for phase two of this project for a complete list of the code changes in the output files. This document can be found in $CASE/2kf522/kmiu/gencpp_final_2000Dec/doc/STLspec5.doc. Using the data model described above in "STL Lists vs. Vectors", here are the row class declarations generated using the old method
class CO : public RC /* CO - Course information */
{
private :
hcg_key DEid; /* foreign pkey of dept */
char name[16]; /* course name */
public :
DE *DEid_pp;
void *DEid_fpp;
void *DEid_bpp;
PR *PRid1_fcp;
PR *PRid1_bcp;
PR *PRid2_fcp;
PR *PRid2_bcp;
CO *prev_ptr;
CO *next_ptr;
void set_DEid (hcg_key id);
void set_name (char *value);
hcg_key get_DEid ();
char* get_name ();
CO* create_row();
void link_row () ;
void parse_row (char *buffer, int idx, hcg_key hcg_k) ;
void add_row (char *viewname) ;
void delete_row () ;
void dump_table (char *viewname, char *file_name, int new_version, char *modestr) ;
void dump_row (char *viewname, char *file_name, int new_version, char *modestr) ;
} ;
class PR : public RC /* PR - Prerequisite */
{
private :
hcg_key COid1; /* foreign pkey of course that has this prereq */
hcg_key COid2; /* foreign pkey of course that describes this prereq */
char test[36]; /* field used for testing only */
public :
CO *COid1_pp;
void *COid1_fpp;
void *COid1_bpp;
CO *COid2_pp;
void *COid2_fpp;
void *COid2_bpp;
PR *prev_ptr;
PR *next_ptr;
void set_COid1 (hcg_key id);
void set_COid2 (hcg_key id);
void set_test (char *value);
hcg_key get_COid1 ();
hcg_key get_COid2 ();
char* get_test ();
PR* create_row();
void link_row () ;
void parse_row (char *buffer, int idx, hcg_key hcg_k) ;
void add_row (char *viewname) ;
void delete_row () ;
void dump_table (char *viewname, char *file_name, int new_version, char *modestr) ;
void dump_row (char *viewname, char *file_name, int new_version, char *modestr) ;
} ;
Regardless of the internal implementation of the tables and the parent-child relationships, that fact that the internal pointers prev_ptr and next_ptr used to implement the tables are public doesn’t conform to the object-oriented concept of information hiding. In addition, the pointers COid1_pp and COid2_pp used in class PR to point directly back at their parent rows are public. So are the *_fcp and *_bcp pointers in class CO, and the *_fpp and *_bpp pointers in class PR. Even if one should choose intrusive data structures over non-intrusive ones, these pointers would have to be made private and wrapped in accessor methods.
Here is the code for the same two row classes using the new method of storing parent-child relationships and tables externally. State diagrams describing the behavior of these classes were created for phase two of this project and can be found in $CASE/2kf522/kmiu/gencpp_final_2000Dec/doc/State_diagram_for_gencpp.doc
class CO : public RC
{
private :
hcg_key DEid; /* foreign pkey of dept */
char name[16]; /* course name */
DE *DEid_pp; /* Parent */
list<PR *> PRid1_children; /* Child list */
list<PR *> PRid2_children; /* Child list */
public :
void set_DEid (hcg_key id);
void set_name (char *value);
hcg_key get_DEid ();
char* get_name ();
void insertChild_PRid1(PR *childRow); /* Adds a child to list */
void removeChild_PRid1(list<PR*>::iterator rowIter); /* Remove child */
list<PR*>::iterator getFirstChild_PRid1(); /* Returns 1st child */
list<PR*>::iterator getLastChild_PRid1(); /* Returns last child */
list<PR*>::iterator PRid1_Terminator(); /* Returns terminator */
void insertChild_PRid2(PR *childRow); /* Adds a child to list */
void removeChild_PRid2(list<PR*>::iterator rowIter); /* Remove child */
list<PR*>::iterator getFirstChild_PRid2(); /* Returns 1st child */
list<PR*>::iterator getLastChild_PRid2(); /* Returns last child */
list<PR*>::iterator PRid2_Terminator(); /* Returns terminator */
void setParent_DEid(DE *currentRow); /* Initializes parent */
DE *getParent_DEid(); /* Returns parent */
void link_row () ;
void parse_row (char *buffer, int idx, hcg_key hcg_k);
void add_row (char *viewname);
list<CO*>::iterator delete_row (list<CO*>::iterator rowIter); /* Deletes the row */
/* Prints a row to a file */
void dump_row (char *viewname, char *file_name, int new_version, char *modestr);
/* Prints a row to terminal */
void dump_row (char *viewname, int new_version, char *modestr);
CO(); /* Constructor */
~CO(); /* Destructor */
} ;
class COtable /* CO - Course information */
{
private :
char abbrev[5]; /* 2-char abbreviation */
char *name; /* Table name */
list<CO*> row; /* List of rows */
public :
char *getAbbrev(); /* Returns table abbreviation */
char *getName(); /* Returns table name */
unsigned int getRowCount(); /* Returns number of rows in table */
list<CO*>::iterator getFirstRow(); /* Returns iterator to first row */
CO *getPrevRow(CO *curr); /* Returns prev iterator */
CO *getNextRow(CO *curr); /* Returns next iterator */
CO *getLastRow(); /* Returns last iterator */
list<CO*>::iterator Terminator(); /* Returns list terminator */
bool isFirstRow(CO * curr); /* True if curr is iterator to first row */
bool isLastRow(CO * curr); /* True if curr is iterator to last row */
void insertRow(CO *newRow); /* Inserts a row at end of table */
list<CO*>::iterator removeRow(list<CO*>::iterator i); /* Removes row from table */
/* Prints out entire table to file */
void dump_table(char *viewname, char *file_name, int new_version, char *modestr);
/* Prints out entire table to terminal */
/* void dump_table(char *viewname, int new_version, char *modestr); */
COtable(char *abbrev, char *name); /* Constructor */
~COtable(); /* Destructor */
};
class PR : public RC
{
private :
hcg_key COid1; /* foreign pkey of course that has this prereq */
hcg_key COid2; /* foreign pkey of course that describes this prereq */
char test[36]; /* field used for testing only */
CO *COid1_pp; /* Parent */
CO *COid2_pp; /* Parent */
public :
void set_COid1 (hcg_key id);
void set_COid2 (hcg_key id);
void set_test (char *value);
hcg_key get_COid1 ();
hcg_key get_COid2 ();
char* get_test ();
void setParent_COid1(CO *currentRow); /* Initializes parent */
CO *getParent_COid1(); /* Returns parent */
void setParent_COid2(CO *currentRow); /* Initializes parent */
CO *getParent_COid2(); /* Returns parent */
void link_row () ;
void parse_row (char *buffer, int idx, hcg_key hcg_k);
void add_row (char *viewname);
list<PR*>::iterator delete_row (list<PR*>::iterator rowIter); /* Deletes the row */
/* Prints a row to a file */
void dump_row (char *viewname, char *file_name, int new_version, char *modestr);
/* Prints a row to terminal */
void dump_row (char *viewname, int new_version, char *modestr);
PR(); /* Constructor */
~PR(); /* Destructor */
} ;
class PRtable /* PR - Prerequisite */
{
private :
char abbrev[5]; /* 2-char abbreviation */
char *name; /* Table name */
list<PR*> row; /* List of rows */
public :
char *getAbbrev(); /* Returns table abbreviation */
char *getName(); /* Returns table name */
unsigned int getRowCount(); /* Returns number of rows in table */
list<PR*>::iterator getFirstRow(); /* Returns iterator to first row */
PR *getPrevRow(PR *curr); /* Returns prev iterator */
PR *getNextRow(PR *curr); /* Returns next iterator */
PR *getLastRow(); /* Returns last iterator */
list<PR*>::iterator Terminator(); /* Returns list terminator */
bool isFirstRow(PR * curr); /* True if curr is iterator to first row */
bool isLastRow(PR * curr); /* True if curr is iterator to last row */
void insertRow(PR *newRow); /* Inserts a row at end of table */
list<PR*>::iterator removeRow(list<PR*>::iterator i); /* Removes row from table */
/* Prints out entire table to file */
void dump_table(char *viewname, char *file_name, int new_version, char *modestr);
/* Prints out entire table to terminal */
/* void dump_table(char *viewname, int new_version, char *modestr); */
PRtable(char *abbrev, char *name); /* Constructor */
~PRtable(); /* Destructor */
};
First of all, the table classes for each type of table are now separate classes to support the concept of the table as an external container. The most interesting member variable is the list<PR*> row member variable which replaces the next_ptr and prev_ptr member variables. In addition to holding pointers to all of the rows and consolidating these pointers in one place, the list<PR*> row member variable is private to the table class. If one wishes to access a row of the table, a message must be sent to the table to retrieve an iterator to the row.
The original generated database code contained the following declarations of global variables:
class PR *PRbegin, *PRcurr, *PRtemp, *PRend, *PRelt, *PRcurr2, *PRtemp2;
These pointers have been replaced with STL iterator objects in order to assist in navigating the list containers.
list<PR*>::iterator PRcurr, PRtemp, PRelt, PRcurr2, PRtemp2;
An iterator to an STL container is more than just a pointer - it is an object in its own right that can be constructed, assigned to, and copied. It contains overloaded operators so that it can be dereferenced in an identical fashion to a pointer. It can also be incremented and decremented the same way a pointer can. The goal of an iterator is to be able to function as a pointer can, but to provide additional functionality specific to the STL container idiom. This can be demonstrated by looking at the new version of the child_loop macro that is generated as compared to the old one.
Here is the old version of the macro. Notice that in order to increment the child##curr pointer to the next child, the application needs to know the name of the foreign key field that is being followed.
#ifndef USE_STL
#define child_loop(parent,child,pkey,ckey) \
for (child##curr = parent##curr->pkey##_fcp; \
((child##curr != NULL) && ((parent *) child##curr) != (parent##curr)); \
child##curr = (child *) child##curr->ckey##_fpp)
#endif /* USE_STL */
Below are the macros which comprise the same child loop in the new version. Notice how the next_child macro does not need to be told when it increments child##curr which list it is following. This is because when an iterator is assigned to, it contains a reference to the specific container instance that it is following. In our case, if start_child_chain initializes PRcurr to follow
COcurr->getFirstChild_PRid1
, then COcurr++ understands that it needs to point to the next element of the PRid1_children container.#undef start_child_chain
#define start_child_chain(parent,child,fkey) \
child##curr = (*parent##curr)->getFirstChild_##fkey()
#undef next_child
#define next_child(child,fkey) \
child##curr++
#undef child_loop
#define child_loop(parent,child,pkey,ckey) \
for(start_child_chain(parent,child,pkey); child##curr != (*parent##curr)->pkey##_Terminator(); \
next_child(child,pkey))
This added functionality and the ability of the iterator to retain state within the container that it points to was the biggest reason we changed the type of the global cursors from pointers to iterators. By making the data structure non-intrusive, a pointer to a row object can no longer obtain the location of the next row within that table or the next sibling of a common parent row, because the row itself no longer contains this information. As a consequence, to get the next row of a table from only a pointer to a row object, one would have to search the appropriate container in a linear fashion to obtain an iterator to the list node that contains the pointer to the row object, and then increment that iterator. This made keeping the global variables as type "pointer to row" prohibitively expensive from a performance point of view, so we did not implement complete backward compatibility in our generated database code. Such a version is possible, however. A simple test application using this methodology is in the $CASE directory subtree. Its location will be listed in Appendix B.
When discussing the generated database methods, there are three general cases to consider.
These three cases have two separate subcases to consider. The first is the operation on a row as an element of a table container. The second is the operation on a row as an element of a child list container.
Inserting a row into a table container is done in two ways. When an application is initialized, pr_load is called. Pr_load calls a function called load_data, which then calls the row object’s link_row method. When a row is added after initialization, the application calls the add_row method of the row object, which does many of the same things that load_data does in a type-specific way, and then also calls the row object’s link_row method.
The link row method can have two basic forms, depending on whether or not the element being inserted is a parent row or a child row. In our running example, the two link row methods look as follows:
// This is the child row object
void PR::link_row()
{
hcg_key PRid;
#ifndef USE_STL
insert_element(PR);
#else
PRtab->insertRow(this);
#endif /* USE_STL */
link_parent_bp_m(PR,CO,COid1,COid,COid1,PRid1);
link_parent_bp_m(PR,CO,COid2,COid,COid2,PRid2);
}
// This is the parent row
void CO::link_row()
{
hcg_key COid;
#ifndef USE_STL
insert_element(CO);
#else
COtab->insertRow(this);
#endif /* USE_STL */
link_child_bp_m(PR,CO,COid1,COid,COid1,PRid1);
link_child_bp_m(PR,CO,COid2,COid,COid2,PRid2);
link_parent_bp_m(CO,DE,DEid,DEid,DEid,COid);
}
Insertion becomes much simpler when using the list containers. The 43 lines of pointer manipulation code in the macro insert_element that are used under the old scheme are replaced by a call to the row’s insertRow method. The code for this method is straightforward. This code simply walks the row list and inserts the entry into the row in sorted order.
void PRtable::insertRow(PR *newRow)
{
list<PR*>::iterator iter;
iter = row.begin();
while (iter != row.end() && (*iter)->get_pkid() < newRow->get_pkid())
iter++;
row.insert(iter, newRow);
}
When a new row is inserted, it will call the macro link_child_bp_m to connect itself to all of its children, and link_parent_bp_m to connect a row to its parents. Since the type of the container and number of pointers are transparent to the user, the twelve macros link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m
have been reduced to link_child_nobp_m, link_parent_nobp_m, unlink_parent_nobp_m, and unlink_child_nobp_m. The rest have been #define d to these four. The link_child_bp_m macro looks like this.
#define link_child_bp_m(ca,pa,cf,pk,p,c) \
ca##curr2 = ca##tab->getFirstRow(); \
while(ca##curr2 != ca##tab->Terminator()) \
{ \
if (key_compare1((*ca##curr2)->get_##cf(),this->get_pkid()) == 0) \
{ \
(*ca##curr2)->setParent_##p(this);\
insertChild_##c((*ca##curr2)); \
} \
ca##curr2++; \
}
The key element here is the insertChild_##c call, which adds a pointer to the child value to the container for the correct key field pointing upward from the child to the parent’s primary key. Before doing this, the instance set the child row’s parent pointer to point at itself. The naming convention for member variables used in the old code is maintained to determine which set of field values need to be compared for insertion into which childlist. Notice that for compatibility with existing applications, the current interface to the macros is maintained even if this leaves some of the arguments unused.
The code for linking a row to all of its parent rows from the perspective of the child looks like this:
/* We use 'this' ptr in place of ca##curr */
#define link_parent_bp_m(ca,pa,cf,pk,p,c) \
for (pa##curr2 = pa##tab->getFirstRow(); pa##curr2 != pa##tab->Terminator() && (key_compare1((*pa##curr2)->get_pkid(),this->get_##cf()) < 0); pa##curr2++); \
if ((pa##curr2 != pa##tab->Terminator()) && (key_compare1(this->get_##cf(),(*pa##curr2)->get_pkid()) == 0)) \
{ \
this->setParent_##p((*pa##curr2)); \
(*pa##curr2)->insertChild_##c(this); \
}
This code walks the parent table until it finds the primary key which matches the given foreign key field in our table. When it finds the row, which is in sorted order in the table, the operation is the same from the child perspective as it was from the parent perspective, except the roles are reversed. The child sets his own parent pointer to point at the parent he has found, and then sends a message to the parent to insert it into the parent’s corresponding child list.
Traversal starts with getting an iterator to the first element of the container in question, whether it is a table container or a list of children corresponding to a foreign key in the child row. It then increments by incrementing the iterator, and then terminates upon reaching the end of the container. For table containers, the table_loop macro traverses the entire table
#define table_loop_all(tbl) for (tbl##curr = tbl##tab->getFirstRow(); tbl##curr != tbl##tab->Terminator(); tbl##curr++)
The getFirstRow() method of the table class simply returns row.begin(), and the Terminator() method returns row.end(). The begin() and end() methods are methods common to any STL container, whether it is a vector, list, or even a string. They return an iterator to the given location. Users should be aware that the end() method of an STL container actually points at a sentinel value one element past the actual end of the container. If this entry is being translated into a pointer, the user must make certain to set the pointer value of the iterator returned from an end() call to null, because the compiler may not do so. Incrementing the iterator is simple because once it has been assigned to an element from a particular container, it retains the context of that container, so incrementing the value automatically returns the next value from the correct list, be it a child list or a table list.
Traversing a child list is just as easy. The following macros are used to traverse child lists:
#define start_child_chain(parent,child,fkey) \
child##curr = (*parent##curr)->getFirstChild_##fkey()
#define next_child(child,fkey) \
child##curr++
#define child_loop(parent,child,pkey,ckey) \
for(start_child_chain(parent,child,pkey); child##curr != (*parent##curr)->pkey##_Terminator(); \
next_child(child,pkey))
The logic is exactly the same as it is for traversing the table, with the additional specification in the third argument of the suffix to the child list to follow, and the specification in the fourth argument of the name of the foreign key in the child that points upwards towards the parent.
Removal of a row from a table is done by calling the row’s delete_row() method. The delete_row method looks like this:
list<CO*>::iterator CO::delete_row (list<CO*>::iterator rowIter)
{
static char rcsid[] = "$Id";
char temp_key[HCG_KEY_SIZE+1];
hcg_key COid;
unlink_child_bp_m(PR,CO,COid1,PRid1);
unlink_child_bp_m(PR,CO,COid2,PRid2);
COcurr = rowIter;
unlink_parent_bp_m(CO,DE,DEid,COid);
#ifdef DEBUG
COid = get_pkid();
decode(temp_key, &COid);
printf("The primary key deleted is %s\n",temp_key);
#endif
rowIter = COtab->removeRow(rowIter);
delete this;
return rowIter;
}
The key sections here are the two calls to unlink_child_bp_m, the call to unlink_parent_bp_m, and the call to the table’s removeRow method. The call to removeRow simply removes the row pointer from the table’s list of row pointers. The unlink_child_bp_m and unlink_parent_bp_m macros looks like this:
#define unlink_child_bp_m(ca,pa,p,c) \
while(this->getFirstChild_##c() != this->c##_Terminator()) \
{ \
(*getFirstChild_##c())->setParent_##p##(NULL); \
removeChild_##c( getFirstChild_##c() ) ; \
}
/* ca##curr is an iterator that is set in ca::delete_row */
#define unlink_child_bp_m (ca,pa,p,c) \
if(getParent_##p() != NULL) \
(getParent_##p())->removeChild_##c(ca##curr);
The unlink_child_bp_m macro simply keeps removing the first element from the appropriate child container in the parent until it is empty after setting its appropriate parent pointer to null. The unlink_child_bp_m macro calls the parent row object’s removeChild_c## method to send the actual private list object a remove message. The remove method of an STL container object takes care of the searching for values and removes all instances of a given value in a container.
For the most part, we were able to implement all of the changes in our phase 2 specification. However, there are four important notes to make.
1) The third argument for the last_child macro changed from "fkey that points upwards from child to parent" to "fkey that points downward from parent to child"
2) XXbegin and XXend were not defined to call the getFirstRow and getLastRow methods of the class as specified in the phase two specification because they need to remain values that can be assigned to. Therefore, they are now list iterators as the rest of the macros are.
3) The access macros (XX__YY, YY__ZZ) were not implemented due to time constraints
4) The del_row, insert_element, and free_table macros are obsolete and has been eliminated.
In order for the generated macros to support the new method of organizing table and row classes, any macros that referred to the former public embedded pointers were rewritten in order to refer to new table and row class methods. In addition, all references to the variables that were formerly referred to as global pointers to row objects were rewritten to use syntax supported by the new declaration of these variables as STL list iterators. The following macros were modified because of their direct reference to XX::next_ptr and XX::prev_ptr.
next_all, pr_find, find_str_loop_all, find_str_loop, find_key_loop_all, find_key_loop, find_fkey_loop_all, find_fkey_loop, find_int_loop_all, find_int_loop, table_loop_all, table_loop, pr_var_find, var_find_str_loop_all, var_find_str_loop, var_find_int_loop_all, var_find_int_loop, var_table_loop_all, var_table_loop, insert_element, link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, del_row (obsolete), and free_table (obsolete).
All references to parent pointers *_pp must be modified to call XX::getParent_ZZid() or XX::setParent_ZZid() depending on the specified action. The following macros were modified for this purpose:
link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m.
As was mentioned previously, these twelve macros were condensed down to link_child_bp_m, link_parent_bp_m, unlink_child_bp_m, and unlink_parent_bp_m Any macro that uses *_fpp or *_bpp was modified to use the child-row iterators in the row class:
next_child, last_child, child_loop, var_child_loop, var_last_child, link_child_bp_m, link_child_nobp_m, link_parent_bp_m, link_parent_nobp_m, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_bp_m, and unlink_child_nobp_m.
Any macro that uses *_bcp or *_fcp was modified to use the child-row iterators in the row class:
start_child_chain, first_child, child_loop, var_child_loop, var_first_child, link_child_1, link_child_bp_m, link_child_nobp_m, link_parent_1, link_parent_bp_m, link_parent_nobp_m, unlink_parent_1, unlink_parent_bp_m, unlink_parent_nobp_m, unlink_child_1, unlink_child_bp_m, and unlink_child_nobp_m.
The following macros also needed to be modified:
child_exists: There is no longer a (struct parent *). The parent is a class and it is type XXtable, not XX.
exists: This macro uses XXcurr to check if the table has any rows. However, using the row iterator in XXtable, we can check the row size to verify that the table contains rows.
pr_check_str: This macro attempts to access a row's private data, and it needs to be modified to call the row's get* method.
The access macros will need to be modified to call the row's getParent_YYid() method. For example:
#define XX__YY XX->getParent_YYid()
#define ZZ__YY ZZ->getParent_XXid()->getParent_YYid()
This work to change the access macros defined above was not completed.
Obsolete Functions/Methods/Macros
pr_parse(): The code from pr_parse() has been modified and moved into load_data(), as described above, making pr_parse() obsolete.
pr_link(): pr_link() is called only by load_data(). Moving its code into load_data(), renders it obsolete. Note that the only thing pr_link() does is call XX::link_row().
XX::create_row(): The new operator calls XXrow's constructor making create_row() obsolete. (See load_data() for an example.)
del_row: This macro is replaced by XX::removeRow() followed by the delete operator (see XXrow::delete_row()).
pr_del_bt(): This function is called only by del_row() which is obsolete. Hence, this function is also obsolete. Bear in mind that the rows are now stored in STL containers not btrees as this function suggests. Since pr_del_bt() is the only function in pr_delete.cpp, this file does not need to be generated by gencpp.
free_table: This macro is obsolete.
pr_get_int, pr_get_flt, pr_get_str, and pr_get_key: These four macros have been obsolete since the original conversion of chgen to gencpp. However, they were never removed.
pr_delete: This macro has been obsolete since the original conversion of chgen to gencpp.
pr_dump_row: This macro calls dump_row(). However, the dump_row() function does not exist in the generated files. It does, however, exist in the gencpp code. Regardless, there is no reason to continue writing this macro to the application's *.h file.
alloc_element: This macro is no longer needed now that we use the new operator to allocate space for a row. However, it is used in gencpp itself.
link_child_bp_m, unlink_child_bp_m, link_parent_bp_m, and unlink_parent_bp_m: All link and unlink macros that implement backpointers are obsolete.
Testing of this application was primarily centered around making certain that deleting rows from a table maintained the proper links between parents and children. Another concern of ours was that we wanted to make sure that in a scenario such as the one used as an example throughout this paper where a child row had two different parent rows, particularly of the same type, that we would be able to maintain these relationships correctly.
The schema used for our tests is pictured below:
In this schema, a department at a school can offer zero to many courses, but a course can be offered by exactly one department. A Prerequisite is an associative object representing the many to many relationship between two courses such that COid1 points to an advanced course which has a given COid2 as a basic prerequisite. In addition to these two tables, two simple tables were used to test error conditions. One of these tables is empty, and the other contains one row and has no parents or children. Here is the schema definition file used:
/********************************************************************/
/* File: test_course_prereq.sch */
/* Schema file used for testing STL list in gencpp */
/* Written by Katherine Miu & Mark Ottesen for 91.522 on 12/16/2000 */
/* */
/********************************************************************/
empty EM /* EM - Empty table for testing only */
{
EMid NA c8 1 /* unique pkey of empty */
test NA c8 0 /* test field */
}
onerow ON /* ON - table containing only one row for testing only */
{
ONid NA c8 1 /* unique pkey of onerow */
name NA c11 0 /* testing only */
}
dept DE /* DE - University Department */
{
DEid NA c8 1 /* unique pkey of dept */
name NA c20 0 /* dept name */
}
course CO /* CO - Course information */
{
COid NA c8 1 /* unique pkey of course */
DEid NA c8 1 /* foreign pkey of dept */
name NA c15 0 /* course name */
}
prereq PR /* PR - Prerequisite */
{
PRid NA c8 1 /* unique pkey of prereq */
COid1 NA c8 1 /* foreign pkey of course that has this prereq */
COid2 NA c8 1 /* foreign pkey of course that describes this prereq */
test NA t35 0 /* field used for testing only */
}
The main program that was used for testing tests the following activities:
1) Loading a database from disk to test the logic of building a set of tables
2) Adding a new child row to the prerequisite object, which tests all of the row construction logic, and all the parent-child linking logic
3) Looping through all five tables using table_loop to verify the results
4) Testing the ability to loop through both sets of children of a given parent to determine integrity of parent child relationships.
5) Tests our ability to determine the first and last elements of a container correctly.
6) Deleting a child row to correctly determine our ability to unlink a row from its parent rows as well as the table container. This also tests our row destruction capability
7) Tests our ability to find rows that have particular values by using find_str_loop
8) Verifies the integrity of child lists after deleting a row.
/*************************************************************************/
/* */
/* test_co_pr.cc: Test file for STL modifications */
/* */
/* Written by Mark Ottesen and Kathy Miu for 2kf522 */
/* */
/*************************************************************************/
#define MAIN
#include <string.h>
#include "test_co_pr.h"
#include "EMschema.h"
#include "ONschema.h"
#include "DEschema.h"
#include "COschema.h"
#include "PRschema.h"
/* The following defines are only necessary if attempting to use
** this file when USE_STL is undefined--i.e. to run the older version
** of gencpp.
*/
#ifndef USE_STL
#define generic_EMcurr EMcurr
#define generic_ONcurr ONcurr
#define generic_DEcurr DEcurr
#define generic_COcurr COcurr
#define generic_PRcurr PRcurr
#else
#define generic_EMcurr (*EMcurr)
#define generic_ONcurr (*ONcurr)
#define generic_DEcurr (*DEcurr)
#define generic_COcurr (*COcurr)
#define generic_PRcurr (*PRcurr)
#endif
//#include <string>
//#include <list>
//#include <iostream>
//using namespace std;
//void StripTrailingBlanks(string& mutant)
//{
// int lnspace = mutant.find_last_not_of(' ');
// if(lnspace != string::npos)
// {
// mutant.erase(lnspace + 1, mutant.size());
// }
//
//}
int main()
{
PR *newPRrow;
#ifdef USE_STL
printf("\n\n****USING STL CONTAINERS FOR TABLES AND ROWS--KMIU&MOTTESEN****\n\n");
#else
printf("\n\n****USING NTANSALA GENCPP****\n\n");
#endif
pr_init("test_co_pr.vdf", "test_co_pr.dat");
pr_load("DefaultView", "test_co_pr.dat");
pr_dump("DefaultView", "test_co_pr.out", 0, "w");
printf("\n*** TEST OF INSERTROW() & ADD_ROW & PR_CHECK_FKEY & PR_CHECK_STR ***\n");
#ifdef USE_STL
newPRrow = new PR;
#else
newPRrow = newPRrow->create_row(); /* class method */
#endif
// newPRrow->set_test("INSERT: Acc04's prereq is Acc03");
newPRrow->add_row("DefaultView");
printf("\n************************* TEST OF TABLE_LOOP ***********************\n");
table_loop("DefaultView", EM) /* on empty table */
{
}
table_loop("DefaultView", ON) /* on one-row table */
{
printf("ON: name is *%s*\n",generic_ONcurr->get_name());
}
table_loop("DefaultView", DE) /* dept table */
{
printf("DE: dept name is *%s*\n",generic_DEcurr->get_name());
}
table_loop("DefaultView", CO) /* course table */
{
printf("CO: course name is *%s*\n",generic_COcurr->get_name());
}
table_loop("DefaultView", PR) /* prereq table */
{
printf("PR: prereq test field is *%s*\n",generic_PRcurr->get_test());
}
printf("\n************* TEST CHILD_LOOP & FIRST_CHILD & LAST_CHILD ***********\n");
table_loop("DefaultView", DE)
{
child_loop(DE,CO,COid,DEid)
{
if (first_child(DE,CO,COid))
{
printf("\nCO children of DE %s (first_child)\n\n",generic_DEcurr->get_name());
}
printf(" %s\n",generic_COcurr->get_name());
#ifdef USE_STL
if (last_child(DE,CO,COid))
#else
if (last_child(DE,CO,DEid))
#endif
{
printf("Done (last_child).\n");
}
}
}
table_loop("DefaultView", CO)
{
child_loop(CO,PR,PRid1,COid1)
{
if (first_child(CO,PR,PRid1))
{
printf("\nPRid1 children of CO %s (first_child)\n\n",generic_COcurr->get_name());
}
printf(" %s\n",generic_PRcurr->get_test());
#ifdef USE_STL
if (last_child(CO,PR,PRid1))
#else
if (last_child(CO,PR,COid1))
#endif
{
printf("Done (last_child).\n");
}
}
child_loop(CO,PR,PRid2,COid2)
{
if (first_child(CO,PR,PRid2))
{
printf("\nPRid2 children of CO %s (first_child)\n\n",generic_COcurr->get_name());
}
printf(" %s\n",generic_PRcurr->get_test());
#ifdef USE_STL
if (last_child(CO,PR,PRid2))
#else
if (last_child(CO,PR,COid2))
#endif
{
printf("Done (last_child).\n");
}
}
}
printf("\n***** TEST FIND_STR_LOOP & DELETE_ROW & REMOVEROW() ***************\n");
find_str_loop("DefaultView", PR, test, "???????????????????????????????????")
{
#ifdef USE_STL
PRcurr = generic_PRcurr->delete_row(PRcurr);
#else
/* PRcurr->delete_row(); *//* This is a bug in the ntansala gencpp because delete_row()
* will free PRcurr which is the looping var used by
* find_str_loop
*/
PR *myPRcurr = new PR; /* kludge */
myPRcurr->next_ptr = generic_PRcurr->next_ptr; /* kludge */
generic_PRcurr->delete_row();
generic_PRcurr = myPRcurr; /* kludge */
#endif
}
table_loop("DefaultView", PR) /* prereq table */
{
printf("PR: prereq test field is *%s*\n",generic_PRcurr->get_test());
}
/* delete first CO row and set COcurr to next row */
#ifdef USE_STL
COcurr = COtab->getFirstRow();
COcurr = generic_COcurr->delete_row(COcurr);
if (COcurr != COtab->Terminator())
#else
generic_COcurr = CObegin;
generic_COcurr->delete_row();
generic_COcurr = CObegin;
if (generic_COcurr != NULL)
#endif
printf("CO name is *%s*\n",generic_COcurr->get_name()); /* part of preceding if stmt */
table_loop("DefaultView", CO) /* course table */
{
printf("CO: course name is *%s*\n",generic_COcurr->get_name());
}
printf("\n************* VERIFY CHILD LISTS AFTER DELETION ***********\n");
table_loop("DefaultView", DE)
{
child_loop(DE,CO,COid,DEid)
{
if (first_child(DE,CO,COid))
{
printf("\nCO children of DE %s (first_child)\n\n",generic_DEcurr->get_name());
}
printf(" %s\n",generic_COcurr->get_name());
#ifdef USE_STL
if (last_child(DE,CO,COid))
#else
if (last_child(DE,CO,DEid))
#endif
{
printf("Done (last_child).\n");
}
}
}
table_loop("DefaultView", CO)
{
child_loop(CO,PR,PRid1,COid1)
{
if (first_child(CO,PR,PRid1))
{
printf("\nPRid1 children of CO %s (first_child)\n\n",generic_COcurr->get_name());
}
printf(" %s\n",generic_PRcurr->get_test());
#ifdef USE_STL
if (last_child(CO,PR,PRid1))
#else
if (last_child(CO,PR,COid1))
#endif
{
printf("Done (last_child).\n");
}
}
child_loop(CO,PR,PRid2,COid2)
{
if (first_child(CO,PR,PRid2))
{
printf("\nPRid2 children of CO %s (first_child)\n\n",generic_COcurr->get_name());
}
printf(" %s\n",generic_PRcurr->get_test());
#ifdef USE_STL
if (last_child(CO,PR,PRid2))
#else
if (last_child(CO,PR,COid2))
#endif
{
printf("Done (last_child).\n");
}
}
}
return(0);
}
The evolution of gencpp seems to be moving toward generating more object-oriented code, and code that is more idiomatic of the C++ language. That being said, many of the macros that currently constitute the API can be rewritten as template functions. There are those, such as child_loop and table_loop, which are not actually functions but merely loop initialization statements. These would not necessarily be good candidates. However, there are some that can be reworked in this fashion.
The table class seems to me to be an ideal candidate for becoming a template class. In this implementation, there are no operations for a particular table type which are specialized for that particular table.
Another possibility for future work would be to investigate ways to make the syntax for dereferencing the global iterators a bit less messy. For one thing, having to dereference a pointer like this all the time
COcurr = (*COcurr)->delete_row(COcurr);
can be a bit awkward. A partial solution is at the top of the main test file, but it would be nice to have a more elegant solution. One possibility would be to have the nodes of the table and parent-child lists use a reference counted implementation of row objects wrapped in an outer class responsible for managing the references. James Coplein, in his book Advanced C++: Programming Styles and Idioms refers to this as a "singleton letter class".
This project would have been easier had the API macros enjoyed a further separation from the internals of the classes than they did when we started. There still is work to do in that regard. For instance, it used to be that when the file was loaded into memory from disk, no operations of the row classes were invoked. We have made changes in this direction. Furthermore, the row classes actually make calls out to macros to facilitate in the loading and unloading of data. We have changed the code so that adding a row no longer calls insert_element. However, it still calls out to link_parent_bp_m and link_child_bp_m to link a row to its parents and children. It would be nice to see the classes be able to operate independently of the macros generated in the database code. This way, the macros would be purely there as a convenience to client API programmers. It seems to me that over time the need for these macros will disappear as the row and table classes develop cleaner interfaces for client API programmers over time.
Appendix A - Porting Changes and Bug Fixes to gencpp
In order to get gencpp to build with Visual C++, the following fixes needed to be
made to the gencpp source:
General Fixes to gencpp Files
Compiler Errors and Warnings when Compiling gencpp Files
Solution: Use #ifdef WIN32 to include string.h on NT platforms.
Solution: Set return type of main to "int" and return 0.
Solution: Include prototypes.h.
Solution: Remove this variable declaration. Also, remove foundnum, parent_pkey, i, and j in gen_pr_parse().
Problem: p_lut_h static string cannot hold more than 2048 characters on WNT platform.
Solution: Split p_lut_h into p_lut_h and p_lut_h2. Add fprintf statement to print out p_lut_h2 in gen_defines.c and gen_lut.c.
Solution: Move embedded "\n\" to end of line: "/* extern void log_pr_add (char *, char *, hcg_ptr ); \n\ ntansala */ in gen_defines().
Problem: The string in one of the fprintf statements is longer than 2048 characters.
Solution: Split the string into two.
Solution: Include prototypes.h.
Solution: Remove this variable declaration in gen_macros().
Solution: Change "gen_pr_add_macros()" to "void gen_pr_add_macros(void)" to match its prototype. Do the same to gen_pr_link_macros(), gen_pr_unlink_macros, and gen_pr_free_macros.
Solution: Remove this variable declaration in gen_pr_link_macros().
Solution: Remove this variable declaration in gen_add_row().
Solution: Remove this variable declaration in gen_dump_table(). Also, remove temp_string, max declarations, and j.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_path().
Solution: Remove this variable declaration in gen_pr_add(). Also, remove foundnum, tmp, parent_pkey, and i.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_pr_link(). Also, remove tmp.
Solution: Fix gen_pr_create() definition to match its prototype.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_pr_delete(). Also, remove parent_pkey, and i.
Solution: Fix gen_pr_del_bt() definition to match is prototype.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_pr_dump(). Also, remove temp_string and j.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_log_do_add().
Solution: Remove this variable declaration from gen_log_do_set_int(), gen_log_do_set_flt, gen_log_do_set_key(), and gen_log_do_set_str().
Solution: Include prototypes.h.
Solution: Set gen_pr_stats() return type to "void".
Solution: Remove this variable declaration from gen_pr_stats().
Problem: p_lut_c static string cannot hold more than 2048 characters on WNT platform.
Solution: Split p_lut_c into seven strings. Add fprintf statements to print p_lut_c2 through p_lut_c7 in gen_lut.c and gen_pr_utils.c.
Problem: p_btree_c static string cannot hold more than 2048 characters on WNT platform.
Solution: Split p_btree_c into seven strings. Add fprintf statements to print p_btree_c2 through p_btree_c7 in gen_lut.c and gen_pr_utils.c.
Solution: Include prototypes.h.
Solution: Remove this variable declaration from gen_rr_matrix(). Also, remove k.
Solution: Remove this declaration from gen_structures().
Solution: Include prototypes.h.
Solution: Include prototypes.h.
Solution: Include prototypes.h.
Solution: Remove this declaration from parse_schema_chgen() and parse_schema_gendb().
Solution: Change "NULL" in "while (tempstr2[x] != NULL)" to " '\0' " in parse_scheme_gendb().
Solution: Add function prototypes at top of file for pr_init(), pr_load(), and pr_dump().
Solution: Remove this declaration from TT_to_tt(). Also, remove parse_state().
Solution: Include prototypes.h.
Solution: Add pr_del_bt prototype to top of file.
Solution: Remove declaration from pr_del_bt().
Solution: Include stdlib.h.
Solution: Add find_view_idx prototype to top of pr_dump.c.
Solution: Include prototypes.h.
Solution: Add meets_view prototype to top of pr_dump.c.
Solution: Remove declaration from pr_dump().
Solution: Add hcg_update_version prototype to top of pr_dump.c.
Solution: Include stdlib.h and malloc.h.
Solution: Include string.h (not strings.h) for WIN32 only.
Solution: Include prototypes.h.
Solution: Remove declaration from lut_insert_element().
Solution: Remove declaration from lut_delete_element().
Solution: Remove declaration from lut_get_name().
Solution: Remove declaration from btree_create().
Solution: Remove declaration from ver_decode().
Solution: Remove declaration from set_version().
Solution: Since fopen() is called to open a file, fclose should be called to close it. Replace two calls to "close" with "fclose" in pr_init().
Solution: Add btree_initialized prototype to pr_load.c.
Solution: Remove declaration from pr_init().
Solution: Replace bzero with memset. This code was already there, but memset was commented out.
Solution: Remove declaration from pr_add().
Solution: Replace close with fclose in pr_load().
Solution: Remove declaration from pr_find_bt().
Solution: Include stdlib.h.
Solution: Include prototypes.h.
Solution: Include prototypes.h.
Compiler Errors when compiling Generated Database Code
As was mentioned earlier, one of the first things we discovered when we started working on gencpp was that the source files that it generates would not compile without substantial alteration. The following are a list of compiler errors in the generated output files. Gencpp was updated so that it generates code that compiles without these errors.
Line of code in XXops.cpp.
link_child_nobp_m(...)
Actual problem is in link_child_nobp_m macro in SISxxx.h file. "(pa *) is not needed on WIN32.(pa *) ca##curr2->p##_fpp = pa##curr;
Solution: Fix gen_macros.c lines 396, 404, 430 to remove leading cast for WIN32 only.
Note: p##_fpp is type (void *).
Note2: This is also a problem on UNIX that we have not fixed, because this code is not compiled when using the STL version of the database code. Yet another good reason to trust the libraries shipped by your compiler vendor when they are available.
Line of code in XXops.cpp.
decode_retstr(value)
Actual problem is decode_retstr expects hcg_key* but "value" is type hcg_key.
Solution: Fix gen_ops.c line 119 to change "value" to "&value".
Line of code in XXops.cpp.
link_parent_nobp_m(...)
Actual problem is in link_parent_nobp_m macro in SISxxx.h file. "(pa *) is not needed on WIN32.
(pa *) ca##curr->p#3_fpp = pa##curr2;
Solution: Fix gen_macros.c to remove leading cast for WIN32 only. Search for link_parent_nobp_m to find the spot(s).
Line of code in XXops.cpp.
unlink_parent_nobp_m(...)
Actual problem is in unlink_parent_nobp_m macro in SISxxx.h file.
(pa *) ca##curr2->p##_fpp = pa##temp;
Solution: Fix gen_macros.c to remove leading cast for WIN32 only. Search for unlink_parent_nobp_m.
Line of code in XXops.cpp.
void FA::set_start_date (char *value)
;}
Actual problem is the extra semicolon.
Solution: Remove extra semicolon within switch/case ‘d’ statement in gen_set_ops() in gen_ops.c.
fprintf(XXops_fp[i],"void %s::set_%s (char *value)\n; ...
Solution: Remove extra semicolon within switch/case ‘d’ statement in gen_structures.c.
fprintf(XXschh_fp[j]," void set_%s (char *value);\n;...
Line of code in XXops.cpp.
strcpy(start_date, value, 26) ;
Actual problem: strcpy should be strncpy.
Solution: Make fix within switch/case ‘d’ statement in gen_set_ops in gen_ops.c.
Line of code in pr_dump.cpp.
exit(1);
Actual problem is pr_dump.cpp needs to include stdlib.h on NT.
Solution: Make the change in gen_pr_dump() in gen_pr_dump.c.
Line of code in pr_free.cpp.
free_table(...)
Actual problem is pr_free.cpp needs to include both stdlib.h and malloc.h on WIN32.
Solution: Make the changes in gen_pr_free() in gen_pr_free.c.
Line of code in pr_load.cpp.
for (i=strlen(s)-1; (i>=0) && ...
Actual problem is pr_load.cpp needs to include string.h and not strings.h on WIN32.
Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.
Line of code in pr_load.cpp.
test = atoi(name);
Actual problem is pr_load.cpp needs to include stdlib.h.
Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.
Line of code in pr_load.cpp.
key_compare(...)
This error may arise only due to the compiler warning setting on NT. However, the code fix is to change the line:
if ( (unsigned long)k1 > (unsigned long)k2 ) return 1;
to the following
return 1;
Solution: Make the change to gen_pr_utils() in gen_pr_utils.c.
See previous error.
Line of code in pr_load.cpp.
clear_mem(hcg_ts_list,sizeof(hcg_ts_list));
Solution: Cast hcg_ts_list to (char *) in gen_pr_init() in gen_pr_init.c.
Line of code in pr_load.cpp.
close(hcg_ascii_fp);
Actual problem is fclose() should be called because fopen() was used.
Solution: Make the changes (error occurs twice) to gen_pr_init() in gen_pr_init.c.
Line of code in pr_load.cpp.
close(hcg_ascii_fp);
Solution: Make the change to gen_pr_load() in gen_pr_load.c.
Line of code in pr_log.cpp.
log_sleep(milliseconds)
Actual problem is this is an old style declaration. Need to define __STDC__ when compiling on NT.
Line of code in pr_log.cpp.
status = usleep(&t);
Actual problem: missing prototype.
Solution: Make fix to gen_pr_log.c.
Line of code in pr_log.cpp.
struct sigaction newact, oldact;
Solution: recode for NT signal handling. This is NOT done.
Line of code in pr_stats.cpp.
exit(1);
Actual problem is pr_stats.cpp needs to include stdlib.h on NT.
Solution: Make the change in gen_pr_stats() in gen_pr_stats.c for WIN32. If this is needed for unix, then remove the win32 ifdef.
Line of code in SISxxx.h.
extern class SE *SEbegin, ... ;
Solution: Rename SISxxx.c to SISxxx.cpp.
Runtime Errors
Naturally, once we actually got gencpp to generate files that would compile and link, we couldn’t expect it to work properly right out of the gate, either. So, we had some runtime errors to fix as well. Here was the most graphic one.
XXid_fcp fields are not initialized prior to use. On UNIX in debug mode, new() initializes to 0, but not on NT, nor in optimized mode on UNIX.
Files were built by running "gencpp -ansi -metafile".
Occurs in XXops.cpp, XX::link_row() when macro link_parent_bp_m is used.
An Access Violation occurred with the following call stack: SC::link_row() pr_link() load_data() pr_load() main()
The actual line of code in link_row() is the macro link_parent_bp_m(ca,pa,cf,pk,p,c). The violation occurs as marked below. On NT, pa##curr2->c##_fcp (in the if statement) is non-null (garbage) after the new operator allocates pa##curr2. But on unix, it points to NULL. Regardless, it needs to be initialized before it is used. Had it been initialized, then execution would have proceeded into the "if" block instead of the "else" block.
if (pa##curr2->c##_fcp == NULL) \
{ \
pa##curr2->c##_fcp = ca##curr; \
ca##curr->p##_bpp = pa##curr2; \
} \
else \
{ \
ca##curr->p##_bpp = ca##temp; \
ca##temp->p##_fpp = ca##curr; \ <--violation
} \
This bug does not show up on UNIX in debug mode because the fields in pa##curr2 happen to point to NULL when pa##curr2 is allocated. In non-debug mode, the same behavior is observed. In other words, any application using "new" should initialize its allocated memory before using it.
Solution: XX::create_rows() must initialize fields to NULL. Can probably use a func similar to make_cptr when writing out create_rows() for each table.
Appendix B - Source and documentation locations
Listed below is an index of the locations of all source code and documentation relevant to this project. All paths listed are relative to $CASE/2kf522/kmiu
../gencpp_final_2000Dec/doc/phase1_example.doc contains an example Information Model for phase1 of the project
../gencpp_final_2000Dec/doc/phase1_projectV2.doc is a report on GENCPP Enhancement to Use Container Classes for phase 1
../gencpp_final_2000Dec/doc/phase2_specification.doc is the the functional and implementation specifications written for phase2
../gencpp_final_2000Dec/doc/phase2_state_diagrams.doc contains the state diagrams for the two new classes, table and row, used in our new version of gencpp
../gencpp_final_2000Dec/doc/final_report.doc is the final report that ties together all three phases of Object Oriented Analysis and Design. It discusses the pros and cons of both intrusive and non-intrusive data structures and explains our rationale for using non-intrusive data structures.
../gencpp_final_2000Dec/src contains all the source code for our new version of gencpp
../gencpp_final_2000Dec/test contains source code for the test we used to test gencpp. This dir also contains the *.sch, *.dat, and *.vdf files used for testing.
../gencpp_2000Nov30 contains source code for bug fixes and NT port of ntansala's version of gencpp. This source code does NOT contain our project enhancements other than the bug fixes and NT port.
Soukup Jiri: Intrusive Data Structures, article published in the C++ Report, in three parts between May and October 1998. (This article describes how to design a template library for intrusive data structures, and compares this library with traditional, container based class libraries).
Meyers, Scott: Effective C++, 2nd Edition, Addison-Wesley 1998, ISBN 0-201-92488-9
Page-Jones, Meilir: Fundamentals of Object-Oriented Design in UML, Addison-Wesley 1998, ISBN 0-201-69946-X
Lee, Richard and Tepfenhart, William: UML and C++ - A Practical Guild to Object-Oriented Development, Prentice Hall 1997, ISBN 0-13-619719-1
Eckel, Bruce: Thinking in C++ 2nd Edition, volumes I and II, Prentice Hall 2000, ISBN 0-13-979809-9
Lippman, Stan and Lajoie, Josee: C++ Primer 3rd Edition, Addison-Wesley 1998, ISBN 0-201-82470-1
Coplein, James: Advanced C++: Programming Styles and Idioms, Addison-Wesley 1992, ISBN 0-201-54855-0