saturn.cs.uml.edu(291)>
cat
NamespaceAndSchemaIntegration050915.txt
From lechner@cs.uml.edu Thu Sep 15 02:35:29 2005
Subject: NamespaceAndSchemaIntegration
(.msdat with SV, TT, TA and app tables)
To: jtan@cs.uml.edu,
jingyoko@hotmail.com
Cc: 05f523
Notes on integration of metaschema- and schema-derived
metadata in chgen/gencpp applications:
RJLRef: $PH/COOL-GEN/NamespaceAndSchemaIntegration.htm
[This began as a response to comments by Jing
Tan in 05f523.
It ended up articulating a proposal for adding
namespace ids
as prefixes to schema ttabbrevs (table-type mnemonics) in the
12-byte-to-32-bit pfkey compression mode supported since genv8
but never used.]
Metaschema is tables SV, TT and TA + tables proposed in
$PH/COOL-GEN/hcg_struct_migrationR1.ppt
(discussed later).
Schema is any other tables
needed by the application.
These may be spread over
system service and application
domains (e.g. JPsim has tables from BDE and LCP as well
as passive layout and active
class subschema within its
process-control application).
.sch
and .msdat files are information-equivalent,
therefore one is
redundant. The content of .dat files
and .msdat
files both include flat files wih the same
format (field name and type
sequence) depending on table
row (object instance) type or
class.
Since .msdat
and .dat tables have identical meta-data
formats, .msdat
should be used instead of .sch.
Entity types are assigned int codes as surrogates
based on their declared order
in the schema (or in the TT table).
The [meta-data] tables
(SV,TT,TA) are read-only, and should
be first so they have the
same position over all applications.
So we can append .msdat for application tables AFTER
.msdat
tables SV,TT,TA. Both levels of specification
will have the same format.
ALL tables are parsed by chgen in the same way:
------------------------
Read the pkey
field to identify the table type of each row.
IFF an internal object with
the same pkey exists,
skip this record. This
prevents TT-rows 1 2 3 and their
TA-children from being over-writen or duplicated
if chgen
already has pre-loaded copies of these tables
to refer to (as a
boot-strapped version will).
The parser allocates memory
for an instance (row) of that
table-type then repeats fscanf for the intermediate fields
in that table-type's TA-child
sequence: When the last TA_child
is reached, do the same thing
unless it is of type tnn
(text string of length nn). In that case, read up to EOLine
into the last field's buffer
(All tnn or cnn fields are
truncated if necessary to
avoid buffer overflow).
//pr_create(//field
of type specified in ttabbrev byte of pkey
child_loop(TT, TA, TAid, TTid){
if (first_child(TT,
TA) //Syntax Not Checkedxa - RJL)
fscanf(fp, *keybuf,"%s");
//get pkey;
else if (! last_child(TT,
TA, TAid, TTid))
//get middle-children (fkeys first, if any)
fscanf(fp,
stringfor(TAcurr->fieldname),
formatfor(TAcurr->fieldtype));
else //last child - needed for
field-type tnn
getchars_untilEOL(fp, *lastfieldbuf);
}
Note 1: formatfor()
is a compile-time or run-time lookup
of a format string equivalent
to the TA field type.
stringfor() needs the stringized
value of the TA field name.
(This is a run-time-data-to-compile-time
identifier translation,
traditionally done by a
switch(data){...case ident: ... } .
You win a prize if you can
avoid some form of switch
or indexed table lookup here
:-).
Note 2: C++ Stream I/O could
absorb all middle fields
by <<field2 <<
field3 << field4, etc. if it detects whitespace
field separators. However,
this list of typed field names
is only known AFTER the
subclass is detected
whose method overrides the
abstract one;
This class is unknown until
the first pkey field1 is read.
------------------
Name-space extensions
(chgen14?):
Real apps will have both
system service domains as well as
application domains, with
separate schemas. chgen13 assumes they
are non-conflicting disjoint
mnemonic 2- or 4-letter abbrevs;
then they can be concatenated
(always in same order).
If they overlap or order is
not predictable or if you just want
to meet common-sense scale-up
requirements, then a namespace abbrev
must be assigned to each
domain and be prepended to each table type
abbrev; chgen
CAN support this by its option ??? to use alternate
4-letter table-type mnemonic,
as part of a 12-byte pfkey format
(256 subschemas,
each with 256 tables). The remaining 8 bytes
include 3 version digits
[0..999] and 5 row digits[0..99999]).
[chgen's
current 32-bit unsigned int hcg_key
typedef
for the binary encoding of pfkeys (since chgenv7) constrains
these fields to 8+8+16 bits,
enough to hold only 256 not 26*26 = 676
namespace and table type
codes; 64-bit keys would scale up:
12-byte ASCII keycodes are limited to 676*676*100,000,000
instances if 2 letters define
(26*26) namespaces and table types.
Instead of 676 domains, I
would prefer 26 domains (A..Z],
each with 10 [schema]
versions [0..9]: this constrains
the first 4 type-selecting
bytes to 260 domain*version
codes, each with 26*26=676
table types.]
Domains could have up to 10
schema versions; first 2 bytes
defines schema and version
(stored in a row of table SV).
Next two bytes defines table
type (stored as a row of table TT).
---------------------------------------------------------
To appreciate the compactness
of this representation, go to
the .ppt
show at $PH/DataModels05fr1.ppt
[In an earlier email I
suggested you all need to be familiar
with it this month.] First, read slides 51; Reflective databases
and slide 52: Metatables TT and TA, of $PH/DataModels05fr1.ppt.)
Slides 55-58 illustrate how
schema tables are drawn in bde,
converted to .sch form by b2t|t2s, and augmented by chgen
-metafile
with next-row, parent,
first-child and next-sibling pointers,
before generating schema.h and pr_*.c and schema.msdat.
Slide 59: MetaSchema
Tables TT and TA defines the .msdat file
CONTENT of tables TT and TA
for the sample application schema
(tables SU, WH, IT on slides
55-59) as produced by chgen -metafile.
Slide 60: Meta-tables TT, TA
are Self-Describing defines the
content of tables TT and TA
when they are DESCRIBING THEMSELVES!).
What I meant by putting meta-schema
tables first is to make chgen
concatenate the content of
slide 59 AFTER the content of slide 60.
(SV was later defined as the
FIRST table and parent of TT
so it should PRECEDE what's
in slide 60.) This can be done
by concatenating meta-schema.sch and application.sch
and
feeding it to chgen.
BTWay, fkey is_key values 1/-1
and s (not c - that was my typo)
They are explained in slide
41: Pkeys and fkeys, of
==================================================
hcg_structure migration to .msdat.
$PH/COOL-GEN/TTTA_metadata_jk.ppt
(2 slides) documents
the 'hcg-structures'
currently used in chgen to
store schema.sch
content(including views and versions).
From these, chgen11+ builds
the .msdat file, but does not use it.
$PH/COOL-GEN/hcg_struct_migration.ppt
(7 slides)
outlines a project (not done
but TBD - any volunteers?)
to migrate chgen's own source code away from
hcg_structs and use the meta-schema content instead.
This set also shows other
possible additions to
the runtime metadata (View and Version info, table
statistics such as row count
and pkey range,
and fkey
traversal paths for inheritance.)
These can be saved and
reloaded to avoid calling
pr_init to find out the same information.
(chgenv14 should make pr_init obsolete.)
Slide 18: Current Work-arounds, of $PH/COOL-FAQ/COOL_FAQv6.PPT
shows chgen(v14?)
as two phases: GENmeta and GENcode
(TBD: partition genv13 this
way).
The (little) chgen phase1 (GENmeta) would
parse the .sch file,
load metadata tables [SV,] TT
and TA, and write them to the .msdat file.
The (big) chgen
phase 2 (GENcode) would reload
the .msdat
file and work from it instead of hcg-structures.
(chgen
since v11 has been using pr_*.c code internally:
i.e. it is a (bootstrapped)
application of itself.)
(grep
hcg_ in chgen/src to see
how complex
the hcg_struct
references are. That may motivate
you to want this refactoring too ;-):
PS: You don't want ALL 1299 hcg_ refs - (most are trivial):
mercury.cs.uml.edu(44)> cd $CASE/gen/ver_13/chgen/src
mercury.cs.uml.edu(45)> grep hcg_ *.c | wc
1299
8403 107160
But search for the 16 refs to
items below (from TTTA_metadata_jk.ppt),
and only in the pr_*.c files
which process tables TT and TA in chgen/src:
mercury.cs.uml.edu(70)> grep '(hcg_ts_list|hcg_table_seqlist|ts_list|ts_type)'
pr_*.c | wc
16
68 1553
----------------------------------