From omg-list-errors@amethyst.omg.org Tue Jul 25 10:01:15 2006 Subject: Valete From: "Berrisford, Graham" To: "A&D TF" Cc: RJLRef: $PH/OMG/uml2mda_GBerresfordValete060725.txt This being my last day with Atos, this is probably my last contribution to an OMG discussion. I'm creating a knowledge community/repository web site for stuff that interests me, and the cc line contains my home email address if you want news of that. For me, a real-world object (be it Hurricane Martha, or Theseus' Ship, or My Grandfather's Axe) is the history associated with a real-world identifier. Persistence is key. Real-world identifiers are key. And a real-world class is the set of objects that somebody labels using one range of real-world identifiers. Below is a long discussion of identifiers. For me, events are as important as objects, but it is hard envisage event-orientation being accomodated in a meta model of the object-oriented paradigm, since the *effect* of an event on an object (in the EO paradigm) is not necessarily a single *operation* in the OO paradigm. ________________________________ Business identifiers, OMG meta models and UML CASE tools I want to ask about CASE tool support for defining identifiers, mainly because I want to stimulate you to say what you intend or expect the OMG meta models and UML CASE tools to support. I hope you don't find this an abstract or philosophical discussion. One practical concern here what kind of CASE tool is useful to professional analysts, architects designers building high-level, business-level, CIM-level models. Another is whether the OO paradigm is a help or hindrance in this area. As you know, defining the structural model of a business, or the data processed in a business, involves much more than drawing an entity-relationship diagram. The larger part of the work is in defining attributes and the data associated with attributes - especially descriptions, data types, constraints and keys. The definition of keys needs a lot of care and often involves considerable debate. How best to define the identifier(s) of an entity type or class. Should we define a natural key? a surrogate key? or both? How should we form multi-part keys? And how well do CASE tools help us? Some data modelling tools are not very good at helping us define keys, and some UML tools are downright poor. Most CASE tools can readily import and export entity, attribute and relationship names, but the import and export of data types, constraints and keys can be more troublesome.=20 In practice, CASE tools record data types, constraints and keys in different ways because there is no grand Unified Modelling Theory, only some limited and so-far-not-wholly-reconciled modelling paradigms. The two best-known paradigms - relational and OO - are at least partly represented in the CWM and UML meta models of the OMG. But is either paradigm or meta model enough for high-level, business-level, CIM-level models? Should a truly unversal meta model start from a different place altogether? How many meta models does MDA need? I'll come back to this at the end, below. DEFINITION OF MULTIPLE CANDIDATE KEYS The criteria for a candidate key are that for each entity instance the key's value must a) be unique and b) persist over the life of the entity. Often, an entity type has only one candidate key. But it may have several candidate keys, and that is significant in the discussion below. Do you know a CASE tool that makes it easy to define/mark the several candidate keys of an entity type? Is this something you would expect a UML tool to support? DEFINITION OF CONSTRAINTS ON ATTRIBUTE GROUPS The analyst has a duty to mark every potential candidate key in a list of attributes. One way is to attach two business rules of a kind called CONSTRAINTS to the attribute or group of attributes that form the candidate key, to declare a UNIQUENESS CONSTRAINT and a NO UPDATE CONSTRAINT. Do you know a CASE tool that makes it easy to define (separately) a uniqueness constraint and a no-update constraint on an attribute? Do you know a CASE tool that makes it easy to attach such a constraint to a group of attributes? Are these things you would expect a UML tool to support? DEFINITION OF A PRIMARY KEY Selecting one of the candidate keys as the PRIMARY KEY usually has two implications: first that this key will be the primary means of locating an entity instance, second that related entities will include this key among their attributes to identify (and point to) this entity. Do you know a CASE tool that makes it easy to define a primary key as one of several candidate keys? and then to change your mind about which candidate key? Is this something you would expect a UML tool to support? DEFINITION OF NATURAL KEYS AND SURROGATE KEYS NATURAL KEYS are those that exist in the business world. A key may be physically attached to a real world object in the form of a label, a card, a bar code or RFID tag. Indentifiers like Customer Numbers, Account Numbers, Agreement Ids, Order Numbers are often satisfactory candidate keys, because they were invented by business people to serve as identifiers. Attributes like dates and addresses are not such good candidates, though a date can be used in a multi-attribute key as below. Where there is no natural key, then we have to invent a by-this-system-generated SURROGATE KEY that is guaranteed to be unique and fixed for the life of an entity instance. (We are talking here about a surrogate key that is stored in a persistent database, not an object identifier that is generated to identify an object instance in an object-oriented program and exists only as long as an instance is retained in the memory of that program). Any entity may have natural key, a surrogate key, or both. Where there is a natural candidate key, then you don't need a surrogate key. Business people prefer to with natural keys wherever possible. But some designers insist on introducing a surrogate key for *every* entity type, regardless of whether a more natural key exists. I'll come back to discuss surrogate keys in more detail at the end below. Do you know a CASE tool that makes it easy to define one or more candidate keys as being by-this-system-generated rather than assigned externally by humans or some other technology? Is this something you would expect a UML tool to support? DEFINITION OF FOREIGN KEYS Given an assocation relationship connects a child entity to one (and no more than one) parent entity, then the identity of the parent is an atttribute of the child. This mean every candidate key of the parent is potentially an attribute of the child. The relational paradigm assumes a child "inherits" its parent's primary key as a FOREIGN KEY. But potentially, a child could inherit any of the parent's candidate keys, not just the primary key. Most analysts copy a parent's natural key into the child entity's description - even though this foreign key is implied by the association relationship. Because this helps to keep the business context in your head when you are buried in the depth of a complex model. And because the business people you talk to understand natural keys better than surrogate keys. The trouble is, the relational paradigm implies copying only the primary key, and if the primary key is a surrogate, then copying down only the surrogate key means you lose the business context after all. Any decent data modelling tool will help you automatically copy primary keys into child entities as foreign keys. Do you know a CASE tool that will help you copy any candidate key you choose into the attribute list of the child entity? And remove any such foreign key? Is this something you would expect a UML tool to support? DEFINITION OF MULTI-ATTRIBUTE KEYS THAT BUILD ON FOREIGN KEYS Where the parent of a child entity is MANDATORY and FIXED, then any of the parent's candidate keys can be extended to become a candidate key for the child entity. There are two ways to do this. You may extend one parent's key to become a HIERARCHICAL (aka COMPOSITE) KEY ) or combine two parent's keys to become a COMPOUND KEY. E.g. OrderItem (OrderNumber, ItemNumber, Amount, ProductType) shows a HIERARCHICAL KEY. OrderItem (OrderNumber, ProductType, Amount) shows a COMPOUND KEY, assuming the constraint that one order cannot have two items for the same product type. You can build a multi-attribute key on top of a surrogate key, but surely it is better to build on a natural key if you can? Some CASE tools make you list key attributes twice - on their own and as part of a key. Do you know a CASE tool that makes it easy to combine several attributes that are not listed contiguously into a key? Is this something you would expect a UML tool to support? DEFINITION OF SURROGATE KEYS EVERYWHERE? Some designers introduce a surrogate key for *every* entity type, regardless of whether a more natural key exists. Why? Efficiency arguments do not obviously point one way or the other. Adding a surrogate key does not necessarily decrease the total data storage requirement. Whatever access speed advantage might be gained by using single attribute key may be swamped by the total cost of maintaining indexes and performing queries, since you still need to enforce the two standard constraints on every natural candidate key - whether by using a database constraint or handling it in the application. The principal argument for surrogate keys is agility - the ability to change the properties of an object - to tighten or relax constraints on any attribute or attribute group other than the surrogate key. But you don't always need that agility. CG>> "I have seen the results of giving every entity type a surrogate key and found it causes no end of confusion for developers, and in some cases even designers." TS>> "You still need the relationships to the parent entities to keep the business sense. I have seen databases where the PDM has taken the [surrogate key] for everything approach and not kept the [natural keys] in the child tables. Because there was a need to create new versions of records at the drop of the hat, we came to the ludicrous situation where we had to carry forward the original [surrogate key] a child record in order to keep the context of what a new record actually relates to. If we'd kept the [natural keys] this would not have been necessary." MW>> "The primary benefit of a surrogate key is stability over the life of the database implementation - but I'd never expect users to work with them. I've seen too many fights over whether [natural keys] should be primary keys when the whole thing could be avoided by setting up the [natural key] as a [candidate key] for exposure to the users and quietly [using] a surrogate key that is never exposed to any user [as the primary key]. The developer in this pattern only has to explicitly worry about the technical intracacies (such as they are) of a surrogate key when performing an INSERT." CB>> I observe that people with a relational background hate surrogates and those with an OO background like them - because they are close to OIDs. I like them but only if an effort is made to hide them from the end user. Ideally they should be hidden from the programmer as well but SQL wont allow you to do that except (to a limited extent) by using views. [An] advantage of surrogates is that all attributes can be updateable and null. Design time concerns about choosing between candidate keys disappear which also makes it easier to redesign - sorry refactor - production databases. Some applications demand [surrogates], like the databases that support CASE tools themselves because you don't want to commit to a final name at the time the data is first entered. =20 It is somewhat ironic that while many of those most strongly tuned into modeling come from a relational database background, and OO practitioners are more likely to be anti-modelers or at least agile developers who only tolerate throw away model, UML is resolutely OO. HOW MANY META MODELS DOES MDA NEED? What the OMG say is that "The MOF is not intended only as the meta model for a repository of concepts defined using UML, or [limited] to models defined in UML. Data models also participate in this environment, and non-UML modeling languages can partake also, as long as they are MOF-based." OMG web site.=20 But it seems in practice, people want MOF to help them ease exchange of data between OO program specification repositories, so MOF reflects the OO paradigm. The OO designers assumption is that identifiers will be defined in the manner of the OO paradigm - a distinct system-generated object identifier for each base class. So, I hear MOF basically describes an OO database - with surrogates/OIDs rather than primary and foreign keys. This OO paradigm for defining identifiers is surely not what people want for building high-level, business-level, CIM-level models. As long as MOF has no relational features, UML-centric CASE tools for drawing class diagrams won't help you define primary keys and foreign keys. And they don't address questions relating to candidate keys posed above. I'm told the CWM (common warehouse metamodel) does not address all the questions relating to candidate keys posed above.=20 Most CASE tools for building data models assume identifers will be defined in the manner of the relational paradigm - primary keys and foreign keys. But even these don't support candidate key definition in the manner discussed above. 06f622 Assignment 3: (Due Sept 19, 2006) RJLRef: $PH/06f522/asgnt3/06f522asgnt3Due060919.txt [inspired by Graham Berresford's 'swan song' in $PH/OMG/uml2mda_GBerresfordValete060725.txt]