From lechner@cs.uml.edu Tue Oct 10 00:14:57 2006 Received: from saturn.cs.uml.edu (saturn.cs.uml.edu [129.63.8.2]) by earth.cs.uml.edu (8.12.11.20060308/8.11.6) with ESMTP id k9A4EuLl024140; Tue, 10 Oct 2006 00:14:56 -0400 Received: from saturn.cs.uml.edu (localhost [127.0.0.1]) by saturn.cs.uml.edu (8.12.9/8.12.9) with ESMTP id k9A4Hmlh510198; Tue, 10 Oct 2006 00:17:48 -0400 (EDT) Received: (from lechner@localhost) by saturn.cs.uml.edu (8.12.9/8.12.9/Submit) id k9A4Hmfl508891; Tue, 10 Oct 2006 00:17:48 -0400 (EDT) From: Bob Lechner Message-Id: <200610100417.k9A4Hmfl508891@saturn.cs.uml.edu> Subject: Re: 91.522 HW3 - prototype suggestion for keyExists To: cguffey@gmail.com (Chris Guffey) Date: Tue, 10 Oct 2006 00:17:48 -0400 (EDT) Cc: alison_lea@yahoo.com, kbagley@us.ibm.com, cguffey@gmail.com, agabriel@cs.uml.edu, nitin.sonawane@verizon.net, nsonawan@cs.uml.edu (Nitin Sonawane), lechner@cs.uml.edu (Bob Lechner) In-Reply-To: <2da105360610091201k78bdd973i4dd6ce01c897b661@mail.gmail.com> from "Chris Guffey" at Oct 09, 2006 03:01:35 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Status: X-Keywords: X-UID: 1286 Status: RO > From cguffey@gmail.com Mon Oct 9 14:58:53 2006 > From: "Chris Guffey" > To: "Anthony J Gabrielson" > Subject: Re: 91.522 HW3 - prototype suggestion for keyExists > Cc: "Bob Lechner" , alison_lea@yahoo.com, > kbagley@us.ibm.com, nitin.sonawane@verizon.net, > "Nitin Sonawane" > > Prof. Lechner, > There's one thing I'm not completely clear on: > Is keyExists(...) supposed to search a specific TT to see if the key > is there or is it suppose to find the TT (if any) where the key > exists? > > I'm thinking the former is true, right? One of your responses clouded > this a little for me. > My draft asgnt3 in $PH/06f522/asgnt3/asgnt3DraftDueOct12.txt requires clarification of the signatures for the method ckeyExists (or keyExists). I was hoping you would agree on its signature but here goes: keyExists really means keyValueListExists, which its argument-type signature would have clarified: keyValueListExists(char* tableType, /* a TTabbrev */ char* CKname, /* a CK name */ char* valueList) /* list of CK-field values */ This signature makes clear that a database object should be found (or inserted) that has a specified sequence of VALUES for a certain sub-sequence of FIELDS in the object's ORDERED list of data members. The order of columns in a table is defined by the TA-child-list order for the TT-row corresponding to the parent table. What this signature does NOT make clear is that the data type sequence for the value string to be parsed DEPENDS ON WHICH CK our query is based on. E.g., these three candidate keys for the SEction table of schooldb: , and are all legitimate fieldname sequences with different respective type signatures (uint = unsigned int); , , Thus parsing each field value in the list is dependent at runtime on the value of meta-attribute CAcurr->TAid_pp->ftype (= I4, F4, c##, t##, or pfkey) as specified by TA-table metadata in application schema schooldb.sch. One (inelegant) soluion is to imitate dprint.h, where dprint... function names have type signatures appended (d,f,s,dd,ds, etc,) The type signature parser would traverse a tree of switch cases using each (enumerated) field type in turn as a switch argument. Conclusion to Chris' initial question: I think the latter is true, not the former, BUT I can't be sure, until he (like Anthony) clarifies what he means by 'search a specific TT'? TT is NOT the table of database 'objects' to be searched, unless we are passed a CK-component value list to look for in a candidate key query on table TT. But then we are not searching the actual database for a row of some arbitrary schema table of type XX with metadata under a corresponding meta-object (an entity type declaration, which is stored in a TT-row and its TA-children). Look at $PH/DataModels05fr1.ppt slides 52-60; slide 60 shows table TT's entry has pkey TT000001 in metaschema.msdat. (It has pkey TT000002 if table SV is declared first in metaschema.sch.) If all 5 (or 7) metaschema.sch table types are declared before application tables (e.g. tables SU, WH, IT on slide 59), then application tables become rows 6 to 8 (or 8 to 10) of the combined .msdat file of meta-table and table formats. Slides 61 and 62 discuss this sub-schema merging problem. [chgen could be modified to merge meta-schema and schema table definitions whle parsing schema.sch.] E.g. for schoolDB, we may be asked to find an existing row of table EN [ENrollment] with value sequence <91,522,291> for the field name sequence of tables DE, CD, and SE respectively. Suppose course sections are scheduled on the UML campus in 10 timeslots per week, and in 500 rooms (avg). This is a total of 5K SEction records (SE-table rows). [If each section has 20 students (avg) there are 10*500*20 = 100K Enrollment records over all course sections. Sanity check: 10K studens enrolled in 4 course sections (avg) means 40K ENrollment records, not 100K.] A naive implemenetation of a keyExists(SE, CKname, Ckvaluelist) query might search the entire SEction table of 5K records. But a tree-search might only search (say) 20 DEpartments then 50 courses of that ONE dept (if found) then 2 sections of that ONE course (if found). That is one advantage of a tree-structured or network-structured database implmentation. [PS: It's true that XML coding into a list of item pairs would avoid column-order-dependence of an object's data fields. But XML, like source code, is totally vulnerable to shuffling the order of records in its line-oriented text representation. A GEN-style database requires a fixed field or column order in each table, but it is immune to any shuffling of its rows, which can be optionally resorted and then re-loaded.]