/usr/proj3/case/gen/gen_enhancements17feb94. Revised 94/2/16 for 94sgen team projects - RJL: At the 94sgen meeting Mon Feb 14 1994, we focused on the items on the slide titled 'chgen/gendb operation - 7', RJL 1/26/94. These include gen1; Port gendb's schema format and parser to replace chgen's parser. Gendb should require descriptive comments in the schema like chgen does, and retain them to make the schema itself an adequate data repository. Without meaningful semantic definitions, the schema is purely a syntactic data model (incuding relation cardinality), not a semantic one. gen2: Split chgen and/or gendb into two passes: Pass1: Populate tables TT and TA from the .sch file and pr/db-dump them. Pass2: pr/db-load these tables and replace hcg_* data structure refs by table refs. gen3: Provide metaschema (TT and TA and TS tables) as data to the application via schema.h (or pr/db-load). (Pr_create/pr_add cannot maintain uniqueness of pkeys unless it is aware of pkeys already in-use. The TS (TableStats) table should hold explicit pkey ranges and counts for each table version in the combined database. Pr_init can read table TS instead of scanning all .dat table versions before they are are loaded. This will improve pr_init's efficiency by orders of magnitude. The TS table must be maintained by pr_add and pr_delete, which now maintain custom hcg_* structures for this purpose.) gen4: Replace Upgrade chgen's gen_* routines to generate new code which refers to tables TT and TA instead of the hcg_structures in schema.h. and a new pr_init procedure which loads table TS (Statistics) also. gen5: (New; same as 8 below but using RCS instead of INGRES or another relational database to store files that contain all table data for a single object, with common version numbers throughout the file. A constant version number causes no false line-mismatch difficulties to RCS. If it is constant it can also be promoted to the file level in Build 2 to conserve space. II. The following addiional topics are from last year's file /usr/proj3/case/gen/gendb/gendb_enhancements.txt and supercede that file. I labeled them as High or Low Priority for chgen and/or gendb. They are all TBD since 93f523 had no gen project team. ---------------------------------------------------------------------- 1. Support btree indices for a specified field or fields. (TBD/Low because the SYM project supports this for at most one data field besides the pkey. Another btree index could support indices on pkeys using another SymbolDictionary with pkeys in its SY table. Gendb.doc suggests indexing tables with specific fields as keys. E.g. 'create index on table students field lastname'. Some code exists and is disabled in gendb. It could be made workable. Chgen ver_8 and 94fsym projects contain different btrees from which code could be re-used. 2. Support dense tables as arrays instead of lists (chgen/gendb): This might be based on the TS table, or by using two passes of the schema while avoiding inefficient dynamic array classes. I don't recommend this project because I can't easily estimate its size. It IS appropriate AFTER we get an effective versioning capability. 3. Use of templates from version 2.1 of C++ (TBD/Low, for gendb only) Templates to handle user-defined classes is proposed by gendb.doc. Templates might support direct mapping of a schema's table types and relations into compilable subclasses of gendb's generic table and field classes. Currently c++ knows nothing of database semantics. Because gendb's code is linked to the application and reads the schema at runtime, gendb's methods translate table and field NAMEs into access methods for them. Templates for tables will be ineffective if most methods must be overridden. Each table is UNIQUE in the relationships in which it participates (as parent or child) as well as in its attribute list. Therefore it appears to me that each table must override any generic method that reads or updates its attribute and link-pointer values. It would be nice if c++ supported a generic string copy command to/from an instance of a row of ANY table type. If not, perhaps chgen could generate a c++ subclass specialized to each table (i.e. supply a class declaration for each table based on its schema prototype).] 4. Versioning (TBD/Low priority, for gendb only) The version numbering and view definition feature of chgen could be improved and brought into gendb. Without it, gendb is impractical for CASE and CAD tools for large team projects. This is low priority in 94s523 because we are now considering using RCS for version control of .dat files which contain a single version of a complex object. 5. Add table TS (table statistics) to the schema database. See topic gen2 and gen3 above. 6. Comparative benchmarks between chgen and gendb (chgen/gendb): (Low priority, for gendb and chgen.) We can benchmark a single-version database or single version view by submitting the view.dat file to applications linked to both tools. This requires schema compability. Manual schema conversion is easy but ought to be automated, unless one or both of chgen and gendb are revised to use a common schema format. 7. Data descriptions in gendb schema (TBD/High, for gendb only): Gendb should require descriptive comments in the schema like chgen does, and retain them to make the schema itself an adequate data repository. Without meaningful semantic definitions, the schema is purely a syntactic data model (incuding relation cardinality), not a semantic one. In topic gen2 and gen3, chgen's schema reader MUST retain these comments. Pass1 should be identical under gendb and chgen; this means gendb's Pass2 should accept optional comments in tables TT and TA. 8. Runtime access to a shared database (TBD/Low for chgen and gendb.) (superceded by topic gen5 above.) This is a good database course project. Since gendb and chgen are single-user database applications, concurrency control is missing. Building this into gendb is suggested in gendb.doc. This can be done by coupling the persistent file interface of gendb with a database server at application runtime. Code for interfacing the INGRES RDB was developed as part of 91.523 DDA projects, well before chgen was conceived. The most recent DDA work was in path /usr/proj3/case/91f523/xrf/base/doc/s91doc, which includes DDA docs and status reports. Sharing object (view) versions through INGRES solves the concurrency problem if (1) INGRES UPDATE permissions are given only to users who can access the GEN-based application AND (2) Version locking can be enforced through this interface AND (3) GEN users never modify the same INGRES records directly. The locking mechanism should be provided by the INGRES server and not by chgen/gendb; I don't know if this is possible. [If .dat files are interfaces to an RDB server (e.g.INGRES), then at dbunload (version checkout) time the RDB must also emit the uptodate TS table (see case/91f523/xrf/base/doc files on DDA's dbload/unload).] 9. Table interface via a window within bde (chgen and gendb) Gendb.doc describes a print method that is directed to the screen or a file. This method ought to be directable to a different window via X11 or Motif, or to a scrollable popup window with selectable entry, like the FILE/open menu option. Generic methods that can be used as callback functions during window-managed editing ought to be developed. For example, this would permit online inspection of tables during interactive editing. III. Extract from /usr/proj3/case/gen/gendb/gendb.doc: Future Enhancements - This section discusses some of the future enhancements and work that could be done to the system. .... The most beneficial enhancement that could be made would be the addition of indices to tables/fields. A new "create" statement syntax could be invented for the schema. For example: create index on table students field lastname Some initial work on indices already exists in the code, but is disabled. Another enhancement would help to minimize the amount of space used to hold records at run-time. Currently, static, fixed size arrays are used to hold various aspects of each record. A better but more complex approach would size the array according to the data that will be stored there. This should not be implemented with a dynamic array class (for example), due to their poor performance. Instead, the schema should be analyzed in a "first-pass" of the schema file (to determine how many relationships will finally exist for each table). In a second pass, the arrays for each case can be sized appropriately. The area of user-defined types could be improved significantly by a developer more fluent in C++, perhaps using parameterized classes (templates) introduced in version 2.1 of the C++ language proper. A sound method to support versioning of data-sets must be designed. This problem exists for both CHGEN and GENDB. Only then can a good implementation be developed. Benchmarks could be developed that compare CHGEN and GENDB. Sample applications could be specified and then implemented in both systems. The benchmarks would have to be fair, and push the limits of performance AND functionality in both systems for a proper comparison. One advantage to this enhancement is that is doesn't actually change either of the systems, and may make a nice project for a student (or team). Finally, production quality DBMS features such as concurrency, locking, transactions, and recovery management could be added (although this is clearly not a simple task).