RJLRef: $CASE/99s523/chgen_fdbk.990315 THis contains ~/99s523/chgen_status.990314 from JK/KS, with edited comments inserted by RJL on 990315. Joe Karner and Keith Spinney have done a good job exposing their understanding of the chgen project, and I have corrected some of their misunderstandings below. RJL ================================================= From Joseph_Karner@ne.3com.com Mon Mar 15 06:53:49 1999 From: Joseph_Karner@ne.3com.com To: Bob Lechner Date: Mon, 15 Mar 1999 06:53:06 -0500 Subject: 99s523 CHGEN status report Hi Dr. Lechner - Status of 99s523 CHGEN project - 3/14/99 Project Members: Spinney / Karner Prepared for: Dr. Lechner [RJL: I edited this to word-wrap more reabably in 80-col text.] We also used the Housedb test application and the hcg-struct-migration.ppt This status report reflects the results of an analysis of CHGEN and the proposed HCG-structure migration project. Input sources included versions 7, 8, and 10 of CHGEN, and their respective project documentation. We also used the Housedb test application and the hcg-struct-migration.ppt PowerPoint presentation. [RJL: The proper starting point is the latest revision gen/ver_10log.] Attached to this report is a two slide Office97 PowerPoint presentation reflecting our understanding of the metadata migration. Please let us know if you have any problem reading this presentation; if so, perhaps we could fax it to you (please include your fax number). [RJL: I can't maintain a fax print; I need .ppt or .bde diagrams. JK and KS can save files in new dir'y $CASE/99s523/chgen, which is group-writable for group 98s523 (99s523 students are in this group).] We made a few changes to the data models from the hcg-struct-migration.ppt slides. Where we saw migration from hcg tables to the new model, we indicated this condition with a number in the diagram box. As an example, hcg_view_list in the hcg table slide is represented by SV type in the migrated model; both are indicated with a 3. Our changes, along with some questions are: 1. On slide 4, under SV - Schema Version, a data field of num_tables is present. We changed this to num_views and added a pointer to the TT tables. [RJL: NO, THis is wrong - SV and VV do NOT have semantically equivalent version number attributes. VV content is specialized to functions WITHIN an application that run concurrently, such as conversion from input to output sub-views of a FIXED schema, while SV is a higher meta-level DB format change, (e.g. adding gale XB in bde); SV changes occur very slowly because they require a database population conversion.] [RJL: SV has num_tables to define the size of the array that can contain table TT, if we extend chgen to generate code for array-type containes as an alternate to list containers. In the case of TT, the schema defines the range of ttabbrev values (a subrange of [0..255]). (Of great importance is verifying and maintaining the correspondence betweeh ttabbr values and the ttidx values now used to index into hcg_structs. We do not know if this can be asserted or is sometimes violated, but there is no known reason why these two mappings should not be the same.)] 2. We removed the pointer from VV ViewVersion to TS TableStats. Although TT and VV were pointing to this table in slide 4, there are different cardinalities associated with these two tables. [RJL: This seems erroneous: you may be confusing 1:M relation arrows with 'pointers'. A 1:M parent-child link on the schema (and its fkey in the child table) IMPLY a 1:M 2-way linked list with ptrs (_fcp, _bcp, _fpp, _bpp) that schema.h declares and pr_*.c maintains in VMNetDB (memory-resident copy of the database). Two such links i(from TT aned VV) to an associative entry TS idefine a M:N relation between VV and TT. The cardinalities need NOT agree (except that VV-->TS has num_tables_in_version children exactly because there are this many fkeys in TS that identify these tables in TT and vice versa).] [RJL: There may indeed be a schema error in the info model; if the error was present it may be corrected thereby. (I do not have your .ppt slides to look at so there may indeed be an error in the info model, which must be discussed first.] 3. We added the versionNo[NCG_NUM_TABLES] to VV. This is required to provide the version number associated with a particular table in a particular view, and is consistent with the hcg metadata shown in slide 3 of hcg-struct-migration.ppt. [RJL: You can't add an ARRAY to a chgen- application table - pr_* utils only support linked lists. These need fkeys in persistent storage. [Since we want to bootstrap chgen ver_11 by making it an application of chgen 10log, we want the metaschema to become the basis for ver_ll's internal meta-database, when ver_11 is built as an applicaiton of ver_10log.] 4. We preserved the link from TT to TA and enumerated the fields we determined necessary in TA. [RJL: ??? Changing the metaschema SV-->TT-->TA definitions requires all prior applications to conform to new schema.sch and TT_-=>TA format; this is more disruptive than changing its content, which defines the application schema. Where is the comparison to current schema format and meta-attributes in TT and TA and justification for changing them?] 5. We added the hcg_table_seqlist to the Current hcg_table hierarchy slide as this structure was discovered during the code analysis. [RJL No comment - no access to the details from here,] This [above] addresses the refactoring of the metadata. [New Topic:] It is our understanding that you also want us to refactor the actual data storage structures. For example, in the house database application, the data associated with a buyer is stored in the BUYR data structure shown below. It, and all other such structures, will be replaced with two data structures tr_type and av_type, also shown below. [RJL: Wow - this is a wild stretch from my original intentions. Besides which , this has already been implemented in gendb (which see). It implies runtime interpretation as I see it, with unacceptable performance penalty. ALso a worst-case string size for each TA-field. Look at gendb code to see what I mean)> [The purpose of chgen is exactly to tailor each class during code generation so runtime interpretation of lists or even arrays is not required. Speed suffers, and homogenizing the field value containers requires union types to share space, and string pool managenent to avoid worst-case string buffer allocations in ta_type. [More details and corrections below - RJL] [The following is N/A because gendb did the same thing before.] struct BUYR /* table of home buyers */ { hcg_key BUYRid; /* primary key field */ hcg_key LLOTid; /* foreign key to lot table */ char bname[31]; /* homebuyers name */ char baddr[31]; /* homebuyers addr */ char bphone[31]; /* homebuyers phone # */ struct LLOT *LLOTid_pp; struct dummy_type *LLOTid_fpp; struct BUYR *prev_ptr; struct BUYR *next_ptr; }; /*********************************************************/ /* Table row type */ /* - this contains a structure which represents one row */ /* (tuple) of a table */ /*********************************************************/ struct tr_type { struct *av_type[MAXATTRIBUTES]; // allows for use of indices instead of // walking a linked list for tuple. } /*********************************************************/ /* Attribute Value type */ /* - this contains a structure which represents one row */ /* (tuple) of a Table */ /*********************************************************/ struct av_type { int int_data; char char_data; char string_data[NAMELENGTH]; struct av_type *next_ptr; // allows for walking a linked list. } We believe this supports the following which you mentioned as one of our goals: "put table over method instead of method over table: ( Table = struct --> class)" [RJL: NO - Here is what I REALLY intended by restructuring chgen code outputs (schema.h and pr_*.c): [THe structure chart (call tree) of chgen shows it calls a sequence of gen_.c modules, each of which has a loop over all TT's and builds a switch statement with one case per table type. This puts function over datatype, as in traditional code. OO code should be built with class (struct or table type) OVER/BEFORE method=function. That way, only one schema_tt.h need be included in each methods_tt.c file. (Each fkey link probably implies 2-way friendship attributes).] [RJL: One MAIN LOOP over table-type tt should open schema_tt.h and pr_util_tt.c and generate ALL methods for ONE table type (tt) at a time. (The first implementation could repeat chgen's sequence of calls to gen* modules for each table type, opening a distinct file for each. Then make sure that only ONE table gets processed by all gen_modules, so only one switch case is applicable. The other cases are superfluous but harmless.) ] This will allow for modelling independent data structures. It also may eliminate the need for chgen to generate application specific code. Rather, the schema.sch file could be parsed upon execution of the application code to determine the contents of the metadata structs, and the input network database could be read to populate the tr_type and av_type internal data structures. Therefore this raises the question - do we really need chgen, or do we simply need a library of functions and macros which are unchanged from application to application, which can simply be linked to the application code? We plan on continuing with the assumption that chgen will produce application-specific code. [RJL: this is CORRECT. Please reply re: the possibility of partitioning the gen_* modules control flow to produce separate .h and .c files for each table type.] Next Steps ---------- A. Using the house database application, we would like to modify the house code to implement the new TT/TA metadata format using ghost variables (i.e. do everything the old way as well as the new way) and verify results. [Note: We feel this is a necessary first step due to the complexity of the code modifications we are attempting. It is very difficult to modify code that in turn develops code that in turn is used to support a database application. We have found that there are too many levels of abstraction to effectively comprehend what we need to do. Rather, we plan to make the changes to a specific application first, then use that application as a model to guide us in the modifications necessary for chgen]. I believe the goal should not change: NO code needs to be removed, and ALL refs can continue to depend on the old structures. Code additions should preserve current (gen_replay) set macros which do duplicate updates. The NEW task is to define and generate the run-time hcg_struct-to-ghostVar_in _TTTA correspondence map. Later projects can migrate client (pr_* and macro callers) to the reference the new TT-->TA datbase (pr_loaded into the application of course). The 99sgen project need not worry about this. [Perhaps this task should be reduced to writing a SPEC without implementation; you too will have valuable insight that would help to write such a spec. [I believe that task defs B, C, D and E are now obsolete - RJL] B. Implement associated data data structure changes in the house database application (tr_type and av_type). C. Verify the changes by running the application and checking that the ghost output is the same as the original output. D. Modify chgen to support the TT/TA metadata and the tr_type and av_type data storage. E. Verify the modified chgen by creating the house files using house.sch, running with the user-developed house.c, and checking that the output files are the same as those from step C. [gen version merging:] F. As a companion task, we will pursue the merging of ver_10, ver_10.2, ver_10log, and ver_10replay. [RJL: This merge is not hard can you do it first? (ignore ver-10.2, gen_dltVals and gen_dfltSuper which are side branches out-of-sequence.) Merge ver_10, 10log and 10replay which are a direct sequence, and should involve no merge conflicts to resolve. The main problem is importing them as sequential revs to CVS.] This appears to be a huge amount of work. Simply performing the changes to chgen will require a complete line-by-line review and possible re-write of most of the code. [RJL: IT is mostly code rearrangement with multiple file generation. THis is also why merging should be done first - code merging would be practicaly impossible after one version has undergone a major control flow rearrangement.] Since we plan on graduating this May, we are very concerned about the likelihood of accomplishing all of this. Therefore, we would like to get your feedback. We would like to meet with you to discuss this project plan, as soon as it is convenient. Thanks, Joe & Keith (See attached file: tt-ta metadata.ppt) (omitted - RJL) [RJL: THis is a fine start on defining the project, Joe and Keith. I will be at UML Tues PM after a 12 noon lunch. I'll probably be there on a later day next week also. Please give me your impressions and questions on my coments above. ] Thanks Bob Lechner