Re: 91.522 HW3- thanks Keith; I've responded now below - and it's still early on Thurs PM.:-) RJLRef: $PH/06f522/asgnt3/06f522asgnt3GenCodeReuse_kb061012.txt See [RJL>>>] responses after each KB>>: > > From kbagley@us.ibm.com Wed Oct 11 22:46:37 2006 > > To: Bob Lechner > > CC: 06f522 > > Subject: Re: 91.522 HW3 - keyExists (kb: discussion group,generics) > > > > Comments below... > > > > Keith Bagley > > > > > > Bob Lechner > > 10/11/2006 02:44 PM > > > > To > > Keith Bagley/Bedford/IBM@IBMUS > > cc > > alison_lea@yahoo.com, Keith Bagley/Bedford/IBM@IBMUS, cguffey@gmail.com, > > agabriel@cs.uml.edu, nitin.sonawane@verizon.net, nsonawan@cs.uml.edu > > (Nitin Sonawane), lechner@cs.uml.edu (Bob Lechner) > > Subject > > Re: 91.522 HW3 - keyExists (kb: discussion group,generics) > > > > > > > > Re: 06f522/asgnt3: keyExists (kb: discussion group, generics) > > Kieth, Your points are well-taken. > > My emails are getting too long and repetitive > > (from year to year not just week to week:-). > > I am fine with a discussion group as long as I don't have > > to manaage it, and is archived (forever:-) > > on the UML/CS network under $CASE. > > Is there a volunteer? > > > > KB>> Well, I don't know how to set a discussion up on the UMass network, > > but I've gone ahead and arranged one on Yahoo...You don't have to manage > > it, and it *is* archived -- maybe not forever, but for a mighty long time. > > If that's acceptable, perhaps we can start moving this thread over there > > this weekend... > > > > My previous email said forget about float field types. > > My interest is in overall CK class and method design, > > in a generic ERD context that supports code generation. > > > > I also approve of your ambition to templatize the > > problem, but we cannot assume the field type list > > will be unique over all CK's of one TT. > > (The field type list signature returned inside the > > CKcurr->CAid child_loop cannot discriminate between > > subclasses: A field's datatype is returned as > > CAcurr->TTid_pp->ftype where ftype is a TA-field > > in metaschema.sch). > > > > KB>> Yes...I think I was having a brain-cramp when I read your original > > message, and got tunnel-vision with the example you gave. For the rest of > > the group, what about using an incremental matching algorithm that looks > > at each sub-part of a potentially composite key and returns the result of > > the aggregated comparison? So, Chris' original pseudocode keyValueExists() > > would change from this: [RJL>>>] Yes...: that's fine - this is a good strategy for reflective use of metadata if you want to program it yourself. I would like to take this opportunity to put in a shameless plug for [chen API] code extension and reuse. It looks like you've set the stage for using TAcurr->ftype to convert the query's value field from ASCII to more efficient binary for pfkeys and ints or floats. This is important because the other way around the search-for-equality loops are slow, with binary to ASCII conversions and slow string comparisons to search key string values inside (to which we can't even apply range or type checks). Notice above that I mentioned the query's value, not CKcurr->value. Table CK holds metadata which composes a CandidateKey as CK-->CA>--TA. Input queries must also be parsed (split into string values and type-ckecked and converted. IMHO these query string test data inputs deserve their own Entity type data format: To achieve maximum chgen API reuse, this format can be declared in new metatables. Each CK-instance implies a new schema table type (TT-row its TA children). Then, queries can be loaded as strings and automatically parsed into ASCII fields. Ints and floats are atoi- or atof-converted, respectively and pfkeys are encoded (compressed to uint values). The encode function is table-type specific: It converts from key_string to hcg_key, where hcg_key is an unsigned int with a range constraint in the table TS-row for that TableType TT-row and ViewVersion VV-row. All the above code is free from chgen, not user-written. Chgen DOES require a format definition for each query valuelist (e.g., CQ is a new table declaration in [meta-?]schema.sch). If metaschema is prefixed to schema.sch, then CK and CA fall inbetween. You could insert CQ tables after CK and CA, but I suggest placing them at the end of schema.sch, after application table defs. Now consider the CK-component to TA-component relationship If the CK and all its components are for the same table TT, Note that CA models a subset of the direct productr of CK X TA as a sparse matrix. Therefore, via child_loop(CK, CA, CAid, CKid) we can perform an embedded list traversal of the relevant CA-rows. [Soukup calls this an intrusive list structure. See $PH/JiriSoukup.] Inside this CK-->CA loop we can pick up each (and only the relevant) CAcurr->TAid_pp->ftype values. Althbough the conversions were done while pr_loading the query (a CQ-row instance) we still need a switch on an enumeration of 4 field types {int, key, float, string} to decide between int, uint, float or string comparisons. [I would go to the trouble of declaring each query type in the schema so that pr_parse can be reused. I conjecture that chgen could easily generate equality (and < or >) tests on every named field, just as pr_accessors.c did for set and get functions. The CK-->CA loop can invoke a TA-specific comparison method while it skips over non-key fields whose values are irrelevant (and probably defaulted if pr_loaded).] [Now that we have bitten the bullet and considered generating a per-TA comparison method, we might also consider generating a variant of it whose CQ-value parameter is still an ASCII string. (just like pr_parse has two variants: one encodes values read from a file, the other skips this part when setting a field value from a pre-encoded source such as bde's GUI).] Three questions remain: (1a) Each CK definition instance may require a test query (CQ) value list with a different TAcurr->ftype signature. Therefore will declaring each CQ to be a subclass of a generic query (GQ) simplify the coding of application tests? (1b) Better yet, can CK's lead to new coding inside chgen to enhance its generated source outputs (specifically, to generate pr_comparators.c as well as pr_accessors.c)? (2a) A CK-based instance comparison generalizes an instance equality test (a class-based comparison over ALL field values) by constraining it to a comparison over ONLY A SUBSET of field values. Therefore it seems like a reasonable extension which can reuse code already available from chgen in new API methods.. (2b) I hinted at a desire to extend CK-component field specs to include fields in aggregates layered over the leaf instances to be compared, Superclass fields are automatically inherited if not over-ridden. Data members of aggregate parents are not so they require customized address expressions. Generating code for these expressions inside chgen extends the current XX__YY macros iin a big way - and is a different project!] (3) Note that char* fields like street address probably have a variable format and we would not like to parse its variants at query time. This is one good reason to standardize this info by separating its fields (streetNumber, streetName, apartmentUnitNumber, etc.) and validating them at data entry time. (I.e., Allow only good quality data into [even test] databases.) Bob Lechner -------------------------------------------------------- > > > > // most details omitted... > > if (v[i] != CKcurr->value) return false > > > > to something like this: > > if (checkPartialKey(v, matchArray, i) == NOT_FOUND) return false; // > > where matchArray is a parallel of v used to track if we've matched > > // the partial key or not > > > > where checkPartialKey would be something like this (in pseudo C): > > checkPartialKey( v[], matchArray[], i ) { > > if( i==0 ) { // time to end recursive search, so go through > > matchArray and return result > > result = 0; // Each index will either by 1 or 0 (sub-match > > or not). So, if the result is < size of the array, > > // we know every term in the composite did > > not match -> NOT_FOUND. If the result == size of array ->FOUND > > for( int j=0; j > result += matchArray[j]; > > if( result == matchArray.size ) return FOUND; > > return NOT_FOUND; // we've recursively search the array of > > fieldnames and didn't match > > } > > if( v[i] == CKcurr->value ) matchArray[i] = 1; > > else matchArray[i] = 0; // fill match array with result of > > composite test > > checkPartialKey(v, matchArray, --i); > > } > > > > > > > > I'm not convinced I've covered all the bases here, but at least this seems > > flexible enough to deal with the potential of having candidate keys that > > are composites and different fieldname sequences for types > > (e.g. with Prof Lechner's original example > > "these three candidate keys for the SEction table of schooldb: > > , > > and > > > > are all legitimate fieldname sequences with different respective > > type signatures (uint = unsigned int); > > > > , , " > > > > > > Thoughts? > > > > Keith > > >