From cguffey@gmail.com Thu Nov 2 01:16:36 2006 Date: Tue, 31 Oct 2006 12:43:57 -0500 From: Chris Guffey To: Anthony J Gabrielson Cc: Keith Bagley , Alison Miles , Nitin Sonawane , nitin.sonawane@verizon.net, Nitin Sonawane , Bob Lechner Subject: Assignment #3 continued I dug through the emails and compiled what I thought to be the most important emails to the keyExists portion of the assignment and threw them into a single txt file which is attached. I did this becuase at least for me, there's too much info spread around in different places so I tried to find the most important stuff. Please feel free to add anything to it. I'm sure we'd all like to get assignment 3 finished so we can move on to other things, hopefully this will help simplify things. [ Part 2, Text/PLAIN (charset: Unknown "ANSI_X3.4-1968") (Name: ] [ "assignment3em.txt") 280 lines. ] [ Unable to print this part. ] ========================================================= I agree as far as you go below. However, here are some caveats. (1) all TA-component values must be found in the SAME actual row instance of the table type or TT-row for which the candidate key is definedi (e.g. table XX). A TT [meta]object is a static property of the class of objects declared by the TT_row (and its TA's). (2) After identifying the field name list that must have values that match the candidate key value, now you need to do a table_loop on table XX (not TT) searching for this value list in each X-row. ALL fields of a single row must match all the CK-to-CA-to-TA field names' values for success. (3) This is a compute-intensive search, so for CK's that we know in advance, hash tables or Btrees might be pre-computed and kept up to date while doing insert/update/delete. Macros or templates to do this in C or C++ resp. might be included in the pr_*.c source files. (4) For asgnt3, assume CK specs are NOT known in advance of compilation. Therefore you will need to interpretively convert a TAid or TAcurr ptr (e.g. by searching a TA-children of the TTrow for table XX) for fieldname (not value) fname into the compiled field offset in struct XX. More comments on this: Each TA that is declared to be a CK-component could become a switch case, inside of which the proper pre-compiled address expression (like XXptr->fname) must be executed; e.g., by extending the chgen routine that generates pr_acessors.c. Here fname at XXptr gets compiled into (XXcurr + offset of field fname from XXptr). [Generic programming experts may know how to use templates for this purpose. Inheritance does not appear to work at getting the correct over-ride access method because TA's are not subclasses of TT.] What we need is a 'map' from TTcurr->TTabbr (e.g., XX) and TAcurr->fname (eg. abc) into a runtime indexed array XX_offset[N] of compilable address expressions (field offsets) so the expression at this index can access XXcurr->fname. For each table type XX, this reduces to a pre-compiled table of field i offsets XXoffset[j] = &(XXcurr->fname) - &(XXcurr) where column j of table XX contains field fname (and the j-th TA-child of the TT-table entry for table XX has attribute name fname). For asgnt3, it would be adequate to manually include this index in the CK-spec meta-data. That is, list column number as well as (not instead of) the actual fname of each field. But this may not avoid macros inside of switch casesa- they may be needed anyway to define correct data-type semantics depending on fname. Run-time performance of this map is a minor problem since searching a (short) TT->TA child_List takes much less time than searching each row of (big) table XX for fname values matching the CK's field value list. Generating the integer offset value for each field of a table as a private int data member of its TA_row would make the external .msdat file platform- and compiler- dependent (big vs little-endian, word alignment problems). (An analogous problem is supplying state action method addresses to the LCP interpreter, as will be seen in Part 2 of this course.) At run-time, child-list TT---*>TA still needs to be searched for each CAcurr->TAid_pp->fname string in child_loop(CK,CA, CAid, CKid), to compute the corresponding column-index to the TA-offset array. I would enumerate each of N fnames in the TA-table for table XX as an 0...N-1 integer sequence. A CK-based query can access each key-component value within a single child_loop(TT,TA, TAId, TTid) that accesses the j-th field value at (XXcurr+XXoffset[j]). For your schooldb test case, defining this table manually as part of the application is probably simpler than exending chgen to produce it automatically for any schema. --------------------- Thanks Prof. Lechner, I think those last two emails cleared up some things. 1) When I say a 'specific TT', I mean a row in the TT table. I was assuming it was an object of some sort that we'd have direct access to. Now I'm think we only have indirect access to it? 2) You talked previously about using pr_parse, however I do not see an entry for it in the chegen manual. 3) To the rest of the team, I'm providing updated psuedocode here based on the revised prototype suggested by Prof. Lechner and using the chgen manual. If anyone else has any comments or suggestions for changes they'd be more than welcome, suffice to say I am not 100% certain about any of this below: bool keyValueListExists(char* tableType, /* a TTabbrev */ char* CKname, /* a CK name */ char* valueList) /* list of CK-field values */ { // find the TT row, my understanding is that pr_find // has some hiden functionality, in that it populates // a global variable with a pointer to the TT row if found // NULL if not found pr_find(TT, TTabb, tableType); if (TTcurr == NULL) return false; // use pr_parse to separate values from valueList // I couldn't find the documentation on pr_parse // in the chgen manual // for the sake of this psuedocode I'll assume it'll // break down valueList into a STL vector v = pr_parse(valueList)? // the ck and tt ids need to be encoded before being passed // into child_loop int nCKid; int nTTid; encode(ttcurr->ttid, &nTTid); encode(ttcur->ckid, &nCKid); int i(0); // my understanding is that child_loop will find and // populate the global CKcurr variable (much like pr_find) // it will loop through the entire linked list updating // CKcurr each time child_loop(TT, CK, nCKid, nTTid) { // compare the values of each ck value // type conversion based on field declarations // is needed before comparing if (v[i] != CKcurr->value) return false i++; } return true; } ----------------------------- RJL comments on pseudo-code below: ----------------------------- 1. (Read chgen user manual): pr_find only searches for primary keys. Therefore this would find the CK-table row with pkey CKpk after encode("CK000001", CKpk) converts pkey from ASCII to uint: pr_find(CK, CKid, CKpk); // updates CKcurr but to search table TT for the row that declares table CK requires pr_find_str(TT, TTabb, "CK"); // updates TTcurr (Note that results are returned through side effects of these calls.) ------------------ 2. pr_parse has a specific definition and function and output data structure for each table type. See the chgen User Manual and perhaps pr_load.c for any schema to see how it works. Moreover it is only defined to work for a particular field sequence that must be declared in metaschema.sch. and captured in table TT as a table-row definition. Therefore if pr_parse code reuse is worthwhile, by all means declare the test data record format as an add-on to schooldb.sch. You might be glad you did, since all pr_*.c functions not just pr_parse can load and access test data record content. [Caveat: the data type of each field is specified in table TA. Therefore, to homogenize the value fields of each CK test query requires either (A) declare all attribute types in the value list to be text strings, which are converted by atoi or atof within your test code; or (B) declare a separate entity type for each query type depending on its field type sequence. (e.g. QA, QB, etc,) Then pr_parse finds out when to apply atof or atoi to each input value field by discovering which specific table type has been read (as identified by the primary key of that query type). [QA, QB etc, could be declared as subclasses of a generic test query table GQ. FOr asgnt3 this is not worthwhile except as a learning experience for chgen-style delegation which currently replaces inheritance.] [Test table schema declarations become even more valuable when you use Q? types in regression tests: Evolving test data realy justifies a versioned database, which the pr_* API is designed to handle.] ----------------------- 3. I did not check your encode call args for correct datatypes, but the NAME tt is not correct - use TTcurr and TTid as chgen's metaschema requires. Schenmas defines table abbrevs as UPPER CASE. Schema.h and pr_*.c macros derive all variable names from these. child_loop(TT, CK, nCKid, nTTid) will fail because nCKid and nTTid are local int (uint?) vars not schema field NAMES. Look at child_loop's definition (grep child_loop in pr_*.c and *.h) to appreciate this. CKcurr->value makes no sense unless CK has a field called value. The test (v[i] != CKcurr->value) makes no sense either. [There are two levels of operation: (a) CK specs are defined as field name lists (possibly in the schema) before code generation. and (b) runtime tests that process queries (e.g. QA) with value lists. pr_parse reads a buffer containing a QA (table QueryA) query record and splits it into an array of value strings. Before searching database table columns for a field value list, pfkeys must be encoded to uints, and int and float fields must be converted to binary. Char* fields are copied [and null-terminated?]. Then QueryASetFname(QAcurr, value) from pr_accessors.c will copy the converted field value to the correct field QAcurr->Fname of the QA struct or object. {To add this object to a QA table, and permit access to it in the pr_acdessors.c way, a constructor for this object must be programmed: (i) pr_create(QA...) followed by a sequence of field initializations from the valueList using QASetFname(). (ii) Optionally call pr_link_* if table QA is related to any others, such as a GQ superclass. (iii) Finally, call pr_add to assign a pkey QAid value and add this QA_row into the QAtable. Logically, child_loop(TT, CK, nCKid, nTTid) makes no sense when processing a test query: the (converted) value list must be compared to the sequence of key-component field values of some application table row-object. These field NAMES can be accessed by name inside child_loop(CK,CA,CAid,CKid) (which updates CAcurr) as CAcurr->TAid_pp->fname (or XXGetfname (where XX is CAcurr->TAid_pp->TTid_pp->ttabb (see metaschema.sch)). This can be compiled, but not dynamically interpreted at runtime. Runtime evaluation requires fname to be converted by a macro into tbl##offset[TAindex(fname)]. This again requires compiling the declaration and initialization of one map TAindex per table type. This converts fname to its field index or to its offset per table type. [There is no free lunch :-(] Again, I congratulate Chris for atempting to define such a sophisticated method in pseudo-code below. If this were a SWEng class we would have preceded this by a data-flow diagram (DFD) which shows how methods are used at compile time and at runtime, to get a sense for what can be accomplished and when, and verify the pre-requisite data is available. Again, code generation opens up new opportunities for code reuse, but this comes with a price: complexity of code to do reflection (inspect metaschema tables TT, TA, CK, CA) and to create views (in schooldb.viewdefs) for test data I/O s well as database transactions. Finally, note that test data processing is a prelude to actual application query processing that manipulates candidate keys, so it is not a waste of time. ---------------------------------- Very pseudo, without much error checking and making assumptions about comparison values and correct types. // ttabbrev is a defined table abbreviation. Assuming CK here. add_CK_record(char *ttabbrev, char ckName, char *valueList) { pr_init("some_view", "schooldb.dat"); //loop through all CKs table_loop("some_view", ttabbrev) { // our key value does not exist if (!keyExists(ttabbrev, CKcurr->name, valueList)) { int num_vals, i; struct TA **attrStruct; attrs = ckey_parse(CKname, &num_vals); // Nitin for (i = 0; i < num_values; i++) { struct TA *curTA = attrStruct[i]; // See if an attr with this value already exists. // If not, create it. Assuming that there is some // standardized non-string seach value assigned to // each attr for searching else we have to do some // conversions. Must find CAid. Also, this is // assuming predictable ordering of values and of // ordering of children, else we must do better // checking to ensure that we're comparing the right // values. child_loop(CK, CA, CAid, CKcurr->CKid) { // add a new CA and TA for this association if (CAcurr->TAid_pp != curTAid) { struct TA *TA_elt; struct CA *CA_elt; TA_elt = pr_create(TA_elt); // Do we need to set this stuff? TA_elt->TTid = CAcurr->TAid_pp; TA_elt->value = curTA->value; CA_elt= pr_create(CA_elt); CA_elt->CKid_pp = CKcurr->CKid; CA_elt->TAid_pp = TA_elt->TAid; pr_add("some_view", CA, CA_elt); pr_add("some_view", TA, TA_elt); } } } } } ------------------------------- > Very pseudo, without much error checking and making assumptions about > comparison values and correct types. > > // ttabbrev is a defined table abbreviation. Assuming CK here. [RJL: ???????????? EIther intialize to or assert !strcmp(ttbbrev,"CK")] > add_CK_record(char *ttabbrev, char ckName, char *valueList)=20=20 [RJL: Again I don't see what 'valueList' has to do with finding a CK-row. That requires a CKname or CK_table-row pkey CKid or a CK-table row# = (CKid modulo 2**16)) (that could be declared in a pre-compiled enum of the CKnames).] [I think you mean a fieldnamelist (fnameList) since the goal of CK-->CA is to declare which TA-rows contain the fieldnames to be queried in future CQ unit test valueLists.] > { > pr_init("some_view", "schooldb.dat"); > > //loop through all CKs > table_loop("some_view", ttabbrev) > {=09=09 > // our key value does not exist > if (!keyExists(ttabbrev, CKcurr->name, valueList))=09 > { > int num_vals, i; > struct TA **attrStruct;<<<<<<<<<<<[WHat is this and where fromn??????-RJL] > > attrs =3D ckey_parse(CKname, &num_vals); // Nitin <<< SPEC? > > for (i =3D 0; i < num_values; i++) > {=09 > struct TA *curTA =3D attrStruct[i];=20 <<<< ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [==> so attrStruct[] contains TA_row ptrs????] > > // See if an attr with this value already exists.=20=20 > // If not, create it. Assuming that there is some=20 <<<<<<<<<<<[Why create it???] > // standardized non-string seach value assigned to=20=20=20 > // each attr for searching else we have to do some > // conversions. Must find CAid. Also, this is = > =20 > // assuming predictable ordering of values and of=20 > // ordering of children, else we must do better > // checking to ensure that we're comparing the right=20 > // values. > =09=09=09 > child_loop(CK, CA, CAid, CKcurr->CKid) <<<< CKid not CKcurr->CKid ^^^^^^^^^^^^Remember macaro args are nameSTRINGS not binary bvalues > { ^^^^^^^^^^^^^^^^^ [OK - Updates CAcurr for const CK per iteration) > // add a new CA and TA for this association > if (CAcurr->TAid_pp !=3D curTAid) > { > struct TA *TA_elt; > struct CA *CA_elt; > ^^^^^^^^^^^^^^^ [child_loop(CK,CA,,) updates CAcurr NOT CA_elt! > TA_elt =3D pr_create(TA_elt); > > // Do we need to set this stuff? > TA_elt->TTid =3D CAcurr->TAid_pp; <<<< [TTid is NOT a TAid_pp!] > TA_elt->value =3D curTA->value; <<<<< ^^^^^^^^^^^^[Do NOT update (const) metaschema SV/TT/TRA! ^^^^^^^^^^^^[Besides, There is no such field in table TA!] > CA_elt=3D pr_create(CA_elt); [[OK-RJL] > =09 > CA_elt->CKid_pp =3D CKcurr->CKid; [OK - RJL] > CA_elt->TAid_pp =3D TA_elt->TAid;=20=20=20 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^66^^^^^^^^^^ [No - the post-condition is: (CA_elt->TAid_pp == TA_elt && TAid fkey in CA == TAid pkey in TA.)] > > pr_add("some_view", CA, CA_elt); <<<<< OK > pr_add("some_view", TA, TA_elt);=09 <<<<<