RJLRef: $PH/06f522/asgnt3/CandidateKeyProcesssing061010.txt You should also review my revised file: $PH/COOL-GEN/TTTA_metadata_jk_rl.ppt The revisions are relevant to CK and CA. [RJL:] I congratulate Chris for attempting to define such a sophisticated method in pseudo-code below. Thanks also for inputs from Anthony Gabrielson. I hope my notes below will help. The chgen/pr_*.c learning curve is large, but can be amortized over many apps. The more that mundane code is standardized, the more time is available to work on the unique aspects of the application. (I might add that mundane aspects get squeezed out of well-paying job descriptions.) > From cguffey@gmail.com Tue Oct 10 11:00:19 2006 > From: "Chris Guffey" > To: "Bob Lechner" > Subject: Re: 91.522 HW3 - prototype suggestion for keyExists > Cc: alison_lea@yahoo.com, kbagley@us.ibm.com, agabriel@cs.uml.edu, > nitin.sonawane@verizon.net, "Nitin Sonawane" > > Thanks Prof. Lechner, I think those last two emails cleared up some things. > > 1) When I say a 'specific TT', I mean a row in the TT table. I was > assuming it was an object of some sort that we'd have direct access > to. Now I'm think we only have indirect access to it? Iff SV, TT and TA are prefixed to your application schooldb.sch then pr_*.c API treats them just like an other accessible Entity type. (direct access). Chgen also has that access at code generation time - although I don't know if it does anything with .msdat content. > > 2) You talked previously about using pr_parse, however I do not see an > entry for it in the chegen manual. > Do you know about grep? (man grep). E.g., in my metatest2/src directory:: ------------- saturn.cs.uml.edu(31)> pwd /tmp_mnt/nfs/galaxy/faculty/fac1/lechner/public_html/06f522/asgnt3/lechner/metatest2/src saturn.cs.uml.edu(32)> grep pr_parse *.h *.c pr_load.c:1539: void pr_parse(); /* 93su523 PGEN merge */ pr_load.c:1636: /* idx is cursor for during pr_parse below, in while looop */ pr_load.c:1637: pr_parse(viewname, hcg_buffer, tbl_encoding, idx, hcg_k); pr_load.c:1657:/* start of gen_pr_parse output to pr_load.c */ pr_load.c:1668:void pr_parse (char viewname[], char buffer[], pr_load.c:1782:} /* end pr_parse */ pr_load.c:1784:/* end of gen_pr_parse output to pr_load.c */ saturn.cs.uml.edu(33)> --------------- Metatable (.msdat) content can be used inside chgen while chgen is running to generate code that will (when compiled) be able to pr_load and process appDB content. That is because pr_parse is also available to chgen at code-gen time: Note that chgen/src runs gen_pr_parse on its input .sch file, while able to call pr_parse on its metaschema.msdat content. [To programmers not yet exposed to compiler boot-strapping this is strange - which is one reason why I believe applying chgen is a useful learning experience.] [Chgen/src typically has gen_pr_whatever() functions that generate its output pr_whatever() function counterparts to work on application-specific data content. Since genv10 chgen has had internal metaschema tables although it still works on more copmlex representations of metadata as discussed in the genv10log report/manual and in slides 1-4 at these URLs: http://www.cs.uml.edu/~lechner/COOL-GEN/CandidateAndSurrogateKeys.* http://www.cs.uml.edu/~lechner/COOL-GEN/NamespaceAndSchemaIntegration.htm E.g. pr_parse is defined (but not yet used) in my chgenv13 checkout tree: --------------- saturn.cs.uml.edu(54)> pwd /tmp_mnt/nfs/galaxy/misc/proj3/case/gen/ver_13/chgen/src saturn.cs.uml.edu(55)> grep pr_parse gen*.c gen_load_data.c:37:void gen_pr_parse(); /* generates function pr_parse() */ /* 93su523 PGEN merge */ gen_load_data.c:77: void pr_parse(); /* 93su523 PGEN merge */ \n\ gen_load_data.c:178: /* idx is cursor for during pr_parse below, in while looop */\n\ gen_load_data.c:179: pr_parse(viewname, hcg_buffer, tbl_encoding, idx, hcg_k);\n\ gen_load_data.c:201: fprintf(prload_fp, "/* start of gen_pr_parse output to pr_load.c */\n"); gen_load_data.c:202: gen_pr_parse(); /* 93su523 PGEN merge */ gen_load_data.c:203: fprintf(prload_fp, "/* end of gen_pr_parse output to pr_load.c */\n"); gen_load_data.c:211:void gen_pr_parse() /* generating function pr_parse() */ gen_load_data.c:228:void pr_parse (char viewname[], char buffer[],\n\ gen_load_data.c:323: fprintf(prload_fp,"} /* end pr_parse */\n\n"); gen_load_data.c:324:} /* end gen_pr_parse() */ gen_pr_log.c:908: fprintf(prlog_fp,"/* the following switch statement is adapted from the one in pr_parse */\n"); saturn.cs.uml.edu(56)> --------------- RJL comments on pseudo-code below: ----------------------------- 1. (Read chgen user manual): pr_find only searches for primary keys. Therefore this would find the CK-table row with pkey CKpk after encode("CK000001", CKpk) converts pkey from ASCII to uint: pr_find(CK, CKid, CKpk); // updates CKcurr but to search table TT for the row that declares table CK requires pr_find_str(TT, TTabb, "CK"); // updates TTcurr (Note that results are returned through side effects of these calls.) ------------------ 2. pr_parse has a specific definition and function and output data structure for each table type. See the chgen User Manual and perhaps pr_load.c for any schema to see how it works. Moreover it is only defined to work for a particular field sequence that must be declared in metaschema.sch. and captured in table TT as a table-row definition. Therefore if pr_parse code reuse is worthwhile, by all means declare the test data record format as an add-on to schooldb.sch. You might be glad you did, since all pr_*.c functions not just pr_parse can load and access test data record content. [Caveat: the data type of each field is specified in table TA. Therefore, to homogenize the value fields of each CK test query requires either (A) declare all attribute types in the value list to be text strings, which are converted by atoi or atof within your test code; or (B) declare a separate entity type for each query type depending on its field type sequence. (e.g. QA, QB, etc,) Then pr_parse finds out when to apply atof or atoi to each input value field by discovering which specific table type has been read (as identified by the primary key of that query type). [QA, QB etc, could be declared as subclasses of a generic test query table GQ. FOr asgnt3 this is not worthwhile except as a learning experience for chgen-style delegation which currently replaces inheritance.] [Test table schema declarations become even more valuable when you use Q? types in regression tests: Evolving test data realy justifies a versioned database, which the pr_* API is designed to handle.] ----------------------- 3. I did not check your encode call args for correct datatypes, but the NAME tt is not correct - use TTcurr and TTid as chgen's metaschema requires. Schenmas defines table abbrevs as UPPER CASE. Schema.h and pr_*.c macros derive all variable names from these. child_loop(TT, CK, nCKid, nTTid) will fail because nCKid and nTTid are local int (uint?) vars not schema field NAMES. Look at child_loop's definition (grep child_loop in pr_*.c and *.h) to appreciate this. CKcurr->value makes no sense unless CK has a field called value. The test (v[i] != CKcurr->value) makes no sense either. [There are two levels of operation: (a) CK specs are defined as field name lists (possibly in the schema) before code generation. and (b) runtime tests that process queries (e.g. QA) with value lists. pr_parse reads a buffer containing a QA (table QueryA) query record and splits it into an array of value strings. Before searching database table columns for a field value list, pfkeys must be encoded to uints, and int and float fields must be converted to binary. Char* fields are copied [and null-terminated?]. Then QueryASetFname(QAcurr, value) from pr_accessors.c will copy the converted field value to the correct field QAcurr->Fname of the QA struct or object. {To add this object to a QA table, and permit access to it in the pr_acdessors.c way, a constructor for this object must be programmed: (i) pr_create(QA...) followed by a sequence of field initializations from the valueList using QASetFname(). (ii) Optionally call pr_link_* if table QA is related to any others, such as a GQ superclass. (iii) Finally, call pr_add to assign a pkey QAid value and add this QA_row into the QAtable. Logically, child_loop(TT, CK, nCKid, nTTid) makes no sense when processing a test query: the (converted) value list must be compared to the sequence of key-component field values of some application table row-object. These field NAMES can be accessed by name inside child_loop(CK,CA,CAid,CKid) (which updates CAcurr) as CAcurr->TAid_pp->fname (or XXGetfname (where XX is CAcurr->TAid_pp->TTid_pp->ttabb (see metaschema.sch)). This can be compiled, but not dynamically interpreted at runtime. Runtime evaluation requires fname to be converted by a macro into tbl##offset[TAindex(fname)]. This again requires compiling the declaration and initialization of one map TAindex per table type. This converts fname to its field index or to its offset per table type. [There is no free lunch :-(] Again, I congratulate Chris for atempting to define such a sophisticated method in pseudo-code below. If this were a SWEng class we would have preceded this by a data-flow diagram (DFD) which shows how methods are used at compile time and at runtime, to get a sense for what can be accomplished and when, and verify the pre-requisite data is available. Again, code generation opens up new opportunities for code reuse, but this comes with a price: complexity of code to do reflection (inspect metaschema tables TT, TA, CK, CA) and to create views (in schooldb.viewdefs) for test data I/O s well as database transactions. Finally, note that test data processing is a prelude to actual application query processing that manipulates candidate keys, so it is not a waste of time. R Lechner > 3) To the rest of the team, I'm providing updated psuedocode here > based on the revised prototype suggested by Prof. Lechner and using > the chgen manual. If anyone else has any comments or suggestions for > changes they'd be more than welcome, suffice to say I am not 100% > certain about any of this below: > > bool keyValueListExists(char* tableType, /* a TTabbrev */ > char* CKname, /* a CK name */ > char* valueList) /* list of CK-field values */ > { > // find the TT row, my understanding is that pr_find > // has some hiden functionality, in that it populates > // a global variable with a pointer to the TT row if found > // NULL if not found > pr_find(TT, TTabb, tableType); > if (TTcurr == NULL) return false; > > // use pr_parse to separate values from valueList > // I couldn't find the documentation on pr_parse > // in the chgen manual > // for the sake of this psuedocode I'll assume it'll > // break down valueList into a STL vector > v = pr_parse(valueList)? > > // the ck and tt ids need to be encoded before being passed > // into child_loop > int nCKid; > int nTTid; > encode(ttcurr->ttid, &nTTid); > encode(ttcur->ckid, &nCKid); > > int i(0); > // my understanding is that child_loop will find and > // populate the global CKcurr variable (much like pr_find) > // it will loop through the entire linked list updating > // CKcurr each time > child_loop(TT, CK, nCKid, nTTid) > { > // compare the values of each ck value > // type conversion based on field declarations > // is needed before comparing > if (v[i] != CKcurr->value) return false > i++; > } > > return true; > } >