| The Data Administration
Newsletter (TDAN.com) Robert S. Seiner - Publisher |
|
"There was a young fellow named
Corey
Last issue in an article in this publication - http://www.tdan.com/i013fe04.htm - I presented the first components of a data catalogue (“metadata repository” in the current argot), in data model form. In that article, emphasis was placed on the elements required to support analysis – entities, attributes, relationships, and so forth. This issue was to have covered the area of design, but that topic has turned out to be bigger than expected. For that reason, this issue will cover relational database and system design, while object-oriented design will be saved for a later issue. As before, this article makes the point that, in most situations, there are relatively few very well defined things that we want to keep track of in a catalogue. To model these things should not be very difficult. These articles present a simple set of models to describe a catalogue that will support a typical application. Yes, these are sketches, and they could certainly be made more elaborate. But they should accurately represent at least those things they set out to represent – concisely and in concrete terms. Readers are encouraged to disagree with the particulars of these models. The nice thing about data modeling is that it gives us a very good language with which finally to clarify what we disagree about. ResponsesBefore moving into the design model, it is worth while to respond to some of the correspondents who commented on the last model. Two respondents commented on the “entity view” concept, and both anticipated a change that your author made as soon as the article was published. Figure 1 shows the model from that issue, with each entity view being composed of one or more entity definition elements each of which must be the use of an entity/object class. The problem with this is that each entity view is really a virtual entity, which has many of the same characteristics and relationships as any other kind of entity/object class.
A better approach is shown in
Figure 2. Here, a virtual entity is simply another
kind of entity/object
class. As in the
previous model, this virtual
entity may be composed of
one or more entity definition
elements, each of which must be the use of another entity/object class. Also, as in the previous model,
each virtual entity may be
composed of one or more attribute
definition elements or
relationship definition elements. For example, the
virtual entity “vendor” is composed of the entity party, plus the relationship buyer in one or more contracts.
Note that Figure 2 has one
other change as well. Where
the definition of a virtual
entity is clearly superior, the change from role/relationship end to simply
relationship. Where previously, role/relationship end described
just half of a relationship, here relationship describes the whole
thing, from one entity/object class to another entity/object class. This is a bit more
intuitive, but it does require two sets of attributes – optionality,
cardinality, and a name for the “from” entity... and optionality,
cardinaity, and a name for the “to” entity.... It also means that the model cannot show which end of the relationship is involved in defining the virtual entity. Presumably this can be inferred by the entity/object class that is used in one of the entity definition elements that are the use in the definition of the virtual entity, but this is not guaranteed. Relational Design – Tables and ColumnsSo, enough with the
requirements analysis. What
about design? First, if the
system is to be based on relational technology, we must describe the
database. Table and column are shown in Figure 3. A table is a collection of data
about something, organized into a set of rows and columns. A column defines a particular thing
to be described by the table, and a row is an occurrence
of the thing the table is
about. Each column must be part of a single table, while each table may be composed of one or
more columns. For our purposes here, “Name” is the only attribute required of table, although probably others can be specified. Attributes of column are (in addition to “Name”, of course) “Format” and “Length”. A word about the attributes shown in these diagrams: If the symbol to the left of an attribute name is an
asterisk (*), then the attribute is mandatory. A value must be specified for each
row in the table. If the
symbol is a circle (o), then the attribute is optional. To mitigate controversy in these articles, unique identifiers (primary keys) are not shown.
Relational theory does not provide for the notion of calculated columns. The idea is that a calculated column duplicates the information already contained in the columns from which it was derived. That may be so, but it is often useful to design in (at least conceptually) computed columns. This is shown in Figure 4, with the new sub-types for column. The attribute “in or out indicator” tells whether the calculation is done at the time rows are populated in the table involved, or if it is done only when data are retrieved. In either case, the presence of a computed column implies that a module is present that will do the calculations. The primary attribute of each computed column is “Formula”, and the formula must be parsed into (composed of) one or more computed column elements. Each computed column element may be the use of one and only one other column, or of a “Constant”. The structure here is remeniscent of a Hewlett Packard calculator, that uses the Reverse Polish Notation. An expression such as “A+B*C + 10” is made up of the following elements:
According to relational theory,
each row of a table must be
identified by one or more unique keys. This is shown in Figure 5. Each unique key must be for a table, of course, and it must be
composed of one or more columns. Each unique key must be either a primary key or an alternate key. Note that there are two business rules that apply to this model that cannot be directly represented by it: 1. A table must be identified by only one primary key. It may be identified by any number of alternate keys. 2.
All columns that are
part of a unique key must be part of the same table that the unique key is for. Unique
key has no attributes here
other than “Name”.
A relationship end (from the previous article) is implemented in a relational database by means of a foreign key. That is, a foreign key is the mechanism by which tables can be related to each other. Foreign key is shown in figure 6. Each foreign key represents a constraint on a table. It is a reference to the
primary key of the same or
another table. This is achieved by its
being composed of one or more
foreign key elements, each
of which is a reference to a
column that is part of the primary key for the table being referred to in the
foreign key. Each foreign key element is also the use of a column that is part of the table constrained by the foreign key.
As the above sentences suggest, this model is subject to two business rules that it cannot express directly: 1. All foreign key elements that are part of a foreign key must be references to columns that are part of a single primary key. That primary key is for the table being referred to in the foreign key.
2. Each foreign key element that is part of a foreign key must be the use of a column
that is part of the same table that the foreign key is a constraint on. Indeed, that column must be part of the primary key for that table. Each foreign key has
a “Name”, but there are no attributes for foreign key element.
ModulesNow, about the programs in our
new system. The basic unit of
programming is called a module. (In this catalogue, that is.
Clearly, other terms could be used.)
Each module
must be written in a particular
programming language. Naturally, any programming language worth
mentioning is the language of
many modules. In addition to “Name”, “Scenario” is shown here as an attribute of module. This links this specification to a document that describes the operation of the module. Each programming language has a “Name”.
These days, a large proportion of the program modules written are for the purpose of displaying or entering data through an interactive screen or window. That is, as shown in Figure 8, a module may be the implementation of one or more windows. Here, of course, a “window” is not a collection of glass and wood or aluminum in a wall, but a representation of a body of data on a computer terminal or personal computer. As such, it may have component parts, here called display groups, where a display group is part of a window. That is, each window may be composed of one or more display groups. Indeed, a window may be composed of one or more other windows. Display groups can be
classified, such that each display
group must be an example of
one and only one display group
type. In addition to “Name”, window has the attribute “Modal indicator” specifying whether or not opening the window locks up the screen and prevents the use of other windows.
These windows have stuff on
them. Each has either data
entry fields, buttons, or other icons for various purposes. In Figure 9, we
represent this by showing that each window and each display group may be composed of one or more of what
are here called interface
elements. That is,
each interface element is an
element on the screen that is part
of either a display
group or a window. An interface element must be either a field, displaying or accepting data, a trigger, the touching of which causes some action to take place, or a display element, which is just there to make the window look pretty. A field must be an example of a field type, such as “text field”, “radio button”, and so forth. It must be the use of a column, although this could be a computed column. A trigger is of a module. That is, pointing to and clicking
on a trigger causes a module to be executed. A trigger must be an example of one and only one
trigger type.
Interface element can have lots of attributes. Among them are:
In addition, field may have the following attributes:
ConnectionsSo, what are the relationships between the analysis elements presented in the previous issue and the design elements presented here? Figure 10
shows how a business function may be mapped to a
program module. A function implementation is the
fact that a particular function
is at least in part implemented by a particular module. That is, each function implementation must be of a function as a module. No attributes are required for function implementation, but a “Description” could be useful. Figure 11
shows that an entity may be
implemented with one or more tables
and that each table
may be the implementation of one or more entities. This is represented by
entity implementation, where each entity implementation must be of an entity as a table. That is, each table may be based on one or more entity implementations, each of
which must be of one and only
one entity. Alternatively, each entity may be the basis for one or more entity implementation, each of
which must be as one and only
one table. Just as entities are implemented as tables, then attributes of entities are implemented as columns in tables. An attribute implementation is of an attribute as a column. That is, each attribute may be the basis for one or more attribute implementations, each of
which must be as a column.
Relationship ends are implemented as foreign keys. One of the reasons that foreign keys do not belong in a conceptual entity/relationship model is that they are mechanisms for implementing the relationships that are already shown. There is no need for them, and they clutter that diagram. It is only when we move to relational design that foreign keys come into play. In Figure 12,
a relationship
implementation is the fact that a particular relationship end is implemented by
a particular foreign key. That is, each relationship implementation must
be of a relationship end as a foreign key.
ResponsesAs expressed last issue, should you, dear reader, take exception to any of the models presented above – good! It is about time we had a discussion on the specific content we expect in a repository, instead of being surrounded by fluff pieces talking about what a good idea it is. Tell me exactly which assertions (entities and/or relationships) you disagree with. The purpose of a data model is to be wrong. This one represents your author’s best guess as to the truth, and it is there for people to correct. Please either write to me at davehay@essentialstrategies.com or post your disagreements to the Data Management Mailing list. You may subscribe to this list by sending an e-mail to dm-discuss-subscribe@egroups.com, or go to its homepage at http://www.egroups.com/list/dm-discuss. Alternatively, if you think these models are completely wrong, please submit an article to TDAN describing your counter argument. Send it to rseiner@tdan.com. In your disagreements, I ask only two things: 1. The model is a set of assertions in the form:
(For example, “Each column must be part of one and only one
table; each table may be composed of one and only
one column.) Please express counter assertions in the same form. Yes, it is true that this is an unconventional approach to defining relationship names, and it is hard. But it is hard because to come up with a reasonable name (that sounds perfectly obvious to the reader), you must really understand the nature of the relationship. If UML is used, each <relationship name> can be shown as a role name. 2. If you draw an alternative model, organize it so that the crow’s feet (or the asterisks, if you use UML) are to the left or the top of the model. This tends to put reference entities in the lower right part of the diagram, and intersect or transaction entities in the upper left. It provides a consistent organization for the drawing, and makes it easier for all to see where the differences are. A veteran of the Information Industry since the days of punch cards, paper tape, and teletype machines, Dave Hay has been producing data models to support strategic information planning and requirements planning for over thirteen years. He has worked in a variety of industries, including, among others, power generation, clinical pharmaceutical research, oil refining, forestry, and broadcast. He is President of Essential Strategies, Inc., a consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. He is the author of the book, Data Model Patterns: Conventions of Thought, published by Dorset House, and, more recently, producer of Data Model Patterns: Data Architecture in a Box™, an Oracle Designer repository containing his model templates. He is a member of and has spoken before The Business Rules Group, DAMA International, and the International Oracle User’s Group, as well as local chapters of these. He may be reached at davehay@essentialstrategies.com,
(713) 464-8316, or The Data Administration
Newsletter (TDAN.com) | ||||||||||||||||||||||||||||||||||||||||