From lechner@cs.uml.edu Fri Oct 6 03:29:43 2006 From: Bob Lechner Subject: Answers to some FAQs on asgnt3 (data models and chgen) To: alison_lea@yahoo.com, kbagley@us.ibm.com, cguffey@gmail.com, agabriel@cs.uml.edu, nitin.sonawane@verizon.net, nsonawan@cs.uml.edu (Nitin Sonawane), lechner@cs.uml.edu (Bob Lechner) TO: 06f522 RJLRef: $PH/06f522/asgnt3/06f522asgnt3faq.061006.txt Here are my comments on three questions about asgnt2: Please consult the Larman page references below. Chapters 9, 13, 16, and 17 are important to understanding and using UML class [data] models. Chgen/gencpp is a useful illustration of how metadata can be used to implement an application database prototype by automatically generating code to support database persistence, access, and navigation, R Lechner ================================================== 1. Do fkeys imply forward and backward linked-lists? ---------------------------------------------------------- > Please correct me if I'm wrong on this but anytime you add a foreign > key to a table, say table XX, chgen adds XX_fcp (and XX_bcp?) to the > parent table? Is this correct? I did not understand this aspect while > doing assignment #2 so my in my first design I tried to replicate that > functionality using the CK, SK, and CI tables. > Correct: _fcp always, _bcp iff chgen has option -bp (But these relate ONLY to chgen-based implementations, not to data models in general). An understanding of chgen is NOT required to produce or read data models. Data models help us define AND understand information about the real world and represent it for processing by software. Note: You MUST understand the differences among (and do NOT confuse) these 3 intimately related concepts: (1) a declaration of a struct or class (table row instance), (2) a declaration of a table as a List container of row instances, (3) the defnition of the actual content of each table row-instance, including its inter-object associations. [In Larmanp. 136: Definition: What are conceptual classes?, (1) above define the Intension, while (2) and (3) define the Extension, or set of examples to which the conceptual class applies.] [next 3 parags were duplicated at begin of section 4. so I removed them - RJL061111] ================================================== 2. Does UML depend on chgen, or vice versa? ------------------------------------------ > This leads me to a second question. In order to properly understand > the UML, don't you also need to understand chgen? The UML implies some > hidden functionality in that the linked list is automatically created > by chgen. This kind of eliminates the blackbox aspect between the UML > and the chgen program itself right? > I think you have that backwards: UML depends IN NO WAY on my code generators in //www.cs.uml.edu/~lechner/COOL-GEN/chgen... However, in order to understand chgen/gencpp (and their generated code) you MUST understand Entity-Relation Diagram (ERD) data models for Relational Databases, preferably EERD's - extended with inheritance. EERD's are a subset of UML class diagram data models. Chgen/gencpp is [only] an implementation technique: it explains what is 'under the hood' of a simple but effective implementation of code generation support for a persistent database (whose lifetime is much longer the code that supports it). Chgen/gencpp also illustrates one way to automatically generate code from a MINIMAL but SUFFICIENT data or class model: one with objects, scalar data members, and inter-object associations of two types: aggregaton and inheritance: (1) chgen reads app.sch file. (2) chgen generates app.h which DECLARES 1. and 3. (3) chgen generates pr_*.c which HELPS you to create/update/delete struct or object instances (and their associations) and maintain them in an external ASCII database (versioned file). You write row-constructors to create a struct or object in 3 steps: a. pr_create to allocate space, b. initializing each field value (including fkeys) by a sequence of (i) pr_set_type(pkey, fldname, fldvalue) [deprecated] or (ii) tablenameSetFldname(pkey, fldvalue) (new, in pr_accessors.c) c. a call to pr_add which (at the last minute) creates a new unique pkey in the object and calls pr_link macros for each fkey value which identifies it as (either but not both) a member of an aggregate or an instance of a subclass. [A pre-condition to step c is that all parent and/or superclass instances already exist. This satifies the Principle of 'Referential Integrity'.] ========================================================= 3. Can is_key extensions define candidate keys? ---------------------------------------------- > Also rather than create multiple new tables, could you have just added > a new key value say 2, so we have for possible values {0,1,2,S} where > 2 has all the properties of 1 but can also be considered a candidate > key? You lost me here. Perhaps you mean add '2' to the value set {0,1,S} of is_key (not key)? That would certainly be appropriate to mark the fields of a single multi-column composite key. (This could include fkeys but not the surrogate pkey, which is a non-composite key in iteslf.) In an RDB schema the components of the primary key are underlined to allow a composite pkey. However, a different tag would be needed to denote a field's participation in each candidate key. The number of these is variable so it becomes a repeating group at the TA table or metadata level (Not 3NF!). In general, a CK could include any element of the 'power set' of all non-pley fields. The TA list of a CK, or the CK list of a TA, are just two assymmetric representations of the symmetric binary relation that table CA represents as a sparse matrix. [Including both 'set-valued' attributes provide efficient two-way access, as does the associative entity CA.-RJL061111] The CK to CA and TA to CA one-to-many relation links highlight the visibility of this pair of repeating groups. Surrogate keys: ---------------- The DB literature definitions of primary, foreign, surrogate and candidate keys are pretty standard by now. The pkey is a particular one of the candidate keys. The standard definition of surrogate key (a machine-readable but not necessarily human-readaable encoding of the primary [p]key) is defined in http://www.cs.uml.edu/~lechner/Obj-RelDBv2/sld005.htm It doesn't pay to use 'surrogate' for other meanings. ============================================== 4. How do application (analysis/conceptual) models differ from domain (design/implementation) models? --------------------------------------------------- Larman tries hard to separate the conceptual, GUI or presentation, and software implementation layers for an application. [(p. 136:) Definition: Are Domain and Data Models the same thing?] [Larman makes this important important point: Conceptual classes without any data attributes are perfectly valid: they can have a purely behavioral role in the domain, instead of an information role. For example: the ActiveClass AC and/or ActiveInstance AI of COOL-LCP (Part II of ths course) are aggregates of StateModels (SMs)i. SMs define behavior at the level of event-driven and conditionally guarded control-flow. AC, AI and SM themselves contain no application-specific data. You DO need to read Larman sections 9.2 to 9.4 (p. 134-136) on Domain Models. Otherwise you will be confused (as I was) by Larman (p.307) which says "entity objects are the application- independent (and typically persistent) domain software objects". At first I disagreed with him here: To me, most if not all persistent objects really depend on the application domain and vice versa. However, Larman distinguishes (more than I have a tendency to do) between objects implemented in software and their 'domain object' counterparts in a more abstract (simplified) conceptual modeling layer or analysis model of the system. In 9.4 (p 138) Larman states that similarity of naming between the applicaton or conceptual domain model (a 'real' service) and the domain layer (implemented as Smallalk 'services') is a Key Idea in OO. because it supports a 'lower representational gap' between our 'mental model' of the domain and its software representation. [OOP can't take all the credit for this - Charles Bachman invented the Data Structure Diagram (DSD) in 1969, well before Chen's ER Diagram (TODS 1/1976). (Bachman's CACM Storage Models paper ilustrates this.) I agree fully. Chgen/gencpp merely take this similarity of naming technique to an extreme level of simplification and integration. I believe the real reason for two modeling layers is what Larman states (9.3 p.138) as a much more important advantage of OOP: "it can support the design of elegant, loosely-coupled systems that scale and extend easily, as will be explored in the remainder of this book." This exploits OOP's ability to encapsulate information (thereby implementing Parnas' Information Hiding concept, 1972). Later Design Pattern examples will help clarify this. ERD's are extended to EERD's by adding 'gen-spec' relations between a generic or base class and its specialized or derived sub-classes. Code to process gen-spec relatins can be implemented by either inheritance or delegation. This meaning of delegation is consistent with Larman (17.8, p.287); "[UI objects] need to delegate (forward the task to another object) the request to domain objects in the domain layer". Chgen currently use delegation, and gencpp does also for upward compatibility; but gencpp can (and should) be extended to exploit the inheritance semantics of OOPLs like C++ and Java. UML does NOT depend on my code generation tool chgen. ================================================