Subject:
COOL Framework for MDD, and SWEng risk-aversion
CS Dept
mini-seminar Wed Sep 21
RJLRef: $PH/05f523/RJLtalk2CS050921.htm
[,.txt]
This is
what I would like to cover in my 10-min talk at the CS seminar on faculty research
areas (on Wed 9/21 3-4PM in OS311).
But alas,
time is too short :-(
I teach
project-oriented OOAD/SWEng courses (91.522/523/524?).
These
all involve an ongoing legacy code project called the COOL Framework for what is now called Model Driven Development, or MDD.
[If this
sound like a sales pitch for COOL, it is: There is plenty of applied research
as well as development to do on MDD tools and their integration.]
MDD is
the latest buzzword for computer-based aids for software development. OMG's standard
UML2.0 (Unified Modeling Language) calls
it MDArchitecture (MDA); the Open Group generalized
it to MDD to avoid some OMG constraints. MicroSoft calls it Software Factories using Domain-Specific-Languge (DSL).
The goal
of MDD is code generation from static
data models (EERD or UML Class Diagrams) and from dynamic
behavior models (Control-State Transition diagrams). Our COOL has three
components GEN, LCP and BDE. All three components are layered together and make
use of one another. My goal is to use them as a test-bed for MDD concepts by
practicing what I preach (i.e., use COOL tools to evolve COOL itself).
The
skills that go into MDD include information modeling (as in relational database
design and OOP Design Patterns), data-driven applications (as in lexical analyzers
and compilers), and of course O-O Programming (as in C++ or Java). Also threads
and concurrent processing for distributed applications.
-----------------------------------------------------
2. Goals of COOL Framework for MDD
The goal
of MDD is automatic code generation from
model-based software designs. Commercial developers are used to generating access code from models
for static data structures such as Entity-Relationship Diagrams for relational databases.
Embedded
system developers are familiar with
state machine models for dynamic behavior. They have been using state-machine
models for years (e.g., iLogix, Kennedy-Carter, PathFinder Soluions, and
Bridgeport/ProjectTechnology (now Accelerated Technology, part of Mentor
Graphics).
COOL is
my own 3-component architecture for Model-Driven-Development
(MDD). The Applied Math side of my brain loves the abstract modeling
aspect of MDD; my engineering side wants to build things, and
experience makes me believe that the real problems (and pitfalls) of MDD can
only be appreciated by actually implementing COOL as a prototypical version of
MDD, and applying it to non-trivial applications.
What
could be more non-trivial as applications than the components of COOL itself? That question can only be answered by applying
COOL to real-world applications, after extending COOL to allow collaborative
development by a team of distributed and
dispersed software engineers (simulated by a network of students :-)
Design-time Advantages of COOL Framework
COOL
components can turn a graphic design into an effective prototype simulation
when augmented by writing low-level application-specific state action routines:
·
COOL
can support hyper-linked browsing through multiple levels of the design.
·
COOL
development targets support for database-change-logging and replay to
o
trace
execution at the state model level (in LCP-based prototypes)
o
replay
the log backward for repeated Undo (e.g. in the BDE editor).
o
do
regression testing (event-driven and otherwise unreproducible)
o
measure
test coverage at the flow control (STD) level.
·
COOL
research targets distributed system development and execution via
o
synchronous
collaborative data exchange
o
database
replication at distributed sites
o
on-line
synchronous interaction among multiple distributed clients
-------------------------------------
LCP State Model Interpreter
LCP is
an interpreter and simulator for State Transition Diagrams. Like flowcharts,
these can be nested in a hierarchy and (as every compiler student knows) can be
called recursively. help visiualize the control branching that is often hidden
by deferred testing of flag variables.
(As a
first cut, Boolean flags are replaced by state code bits. Then simplify the
resulting STD.)
However,
state models are most useful to represent cooperative multi-processing or
threads, I.e., they can describe control
flow at any level of granularity:
·
At
a coarse level they can model distributed systems whose states wait for
client/server requests with unpredictable order and time-of-arrival.
·
At
a fine level they can show conditional and iterative execution of basic blocks
of non-branching code which compilers love to identify as they parse your
source code. (Martin in the UK has
reverse-engineered 60K lines of assembly code into such models.)
·
Simulating
State Models is at least as old as Harel's StateCharts and his (expensive!) iLogix'
StateMate tool. The challenge is to integrate GEN and LCP with a GUI in a
computer-aided design environment (framework).
-------------------------------------------
Version Control for the Design Database
ALL
three COOL components are driven by design data that can be stored in any vendor's
RDB. This alone might justify using XML as an intermediate format. (Currently I
have a hard time curbing student enthusiasm for this :-)
However,
for highly data-driven software prototyping applications, it makes more sense
to use a source code version control system like CVS/RCS for external storage.
The rationale here is that design data defines runtime data structures from
which source code is either automatically generated by GEN, or interpreted by
LCP.
Regardless
of how this design data is captured, it must evolve through a branching tree of
versions just like the source code itself. Hence COOL can use a version control
system (CVS) to maintain its design database and not just source code.
Efficient version control depends on saving incremental differences; GEN's
output code manages a persistent relational database whose normalized flat
tables store ASCII-formatted and line-oriented data that is CVS-compatible.
COOL component integration:
The GEN
component produces code to manipulate data structures, such as field get and
set operators, iterators over data sets, and persistent storage and retrieval.
It can customize this code to arbitrary database models, which the modeler
first normalizes in graphics or text form. Generating database manipulation
code from data models is state-of-the art today; it is no longer a novelty. The challenge is to integrate both data and
state models into a system for both design capture for a prototype and runtime
evaluation.
Smooth
inter-operation of COOL components has been a goal from day one:
·
GEN
depends on a common data model that it shares with LCP and BDE. (GEN even
bootstraps itself from an earlier version, to internalizes metadata and use it
to produce data management code.
·
LCP
and BDE both depend on GEN to produce code for their internal workings.
·
BDE
can depend on LCP to execute its control flow (we currently hide this
dependency in comments). BDE can also be used as an output display for suitable
application prototypes.
·
The
above dependencies are one-way, not circular. BDE closes the loop offline, by
capturing design data, for GEN to convert to compile-able code, and for LCP to
bind and interpret.
·
BDE
is designed to take advantage of LCP state models although BDE does not yet use
LCP. LCP can use BDE as a GUI (display-only mode currently). Input metadata for
GEN and LCP can be derived from BDE
diagrams, although both currently use textual definition work-arounds.
GEN and LCP collaboration:
·
The
GEN component produces code to manipulate data structures, such as field get
and set operators, iterators over data sets, and persistent storage and
retrieval. It can customize this code to arbitrary database models, which the
modeler first normalizes in graphics or text form. Generating database
manipulation code from data models is state-of-the art today; it is no longer a
novelty.
·
Code to manipulate data structures, such as
field get and set operators, iterators over data sets, and persistent storage
and retrieval, is generated by the GEN component.
·
Versions
of GEN exist in various stages of completion can generate source code for C,
C++ and Java. GEN also provides logging and replay of persistent data changes
as they occur inside the application prototype.
·
The
design data that LCP interprets represents (and can be derived from) State
Transition Diagrams. LCP also functions as an event queue manager and
dispatcher, to simulate a prototype for a possibly distributed concurrent application.
·
In
this way LCP controls the runtime behavior of processes, and provides run-time
visibility at the level of state transitions and lower-level function calls.
---------------------------------------------------------
Block Diagram Editor BDE
The
third COOL component is a Block Diagram Editor BDE. The value of BDE is its
ability to present the design as data models and state models in a
two-dimensional form, as distinct styles of
directed graphs or block diagrams. BDE can create and edit multiple
graphic diagram styles for inclusion in the design database:
o Extended Entity-Relation data models and/or UML class diagrams,
o Dynamic behavior models (State
Transition Diagrams).
o Other diagram types can be generated
but our focus has been on exploiting the semantics of EERD's to generate code that manipulates
data, and STD's to define the rules for control states and program sequencing.
o BDE can traverse the equivalent of hyperlinks
up and down multiple levels of abstraction.
Example: see the Juice Plant Demo at
http://www.cs.uml.edu/~lechner/JP2html/
User Guide for BDE:
../COOL-BDE/BDEUserGuide2005/bdeUG_2005.htm,
bdeUG_2005.ppt
-----------------------------------------------
Tradeoffs among levels-of-abstraction
:
Graphic
data models must always trade off concrete completeness against big-picture
context (info-hiding abstraction).
o The big picture requires
visualization of inter-class-level and object-level interactions (event and
message flows).
o Completeness requires attention at a
fine-grain level of detail, and is best left to textual definitions of
declarative and operational programs.
o Complex systems require partitioning
(separation of concerns) and decomposition (step-wise refinement). Complex data
models (entity classes and associations) can be partitioned into sub-schema.
o Nodes representing such sub-schema
can be hyperlinked to lower-level ER model or subschema diagrams.
o Entity class nodes on data models
can include class method names that are hyperlinked to nested state diagrams
that model these methods.
o State models can be decomposed into
smaller (and faster) sub-models or to leaf states that contain or hyperlink to
source code to define their actions. The LCP interpreter calls the
corresponding compiled routines.
o Example: see the Juice Plant Demo at
http://www.cs.uml.edu/~lechner/JP2html/
-----------------------------------------------
Student projects and the COOL
Framework
In
encouraging student projects, I often get highly motivated students who propose
to migrate an existing legacy code base to the latest band-wagon technology.
I hate
to discourage such highly motivated
students but my advice is to first find, and then concentrate on
addressing, factors which have high risk compared to value in such projects, in order to discover future
problems as early as possible and do what is feasible within time and budget.
For example, XML formatting is a low-risk factor. Defining XML schema input to chgen/gencpp is
perhaps the lowest-risk factor because I know
how to do it. This is because our
database is a set of flat relational tables, and these tables can be converted
to/from XML without first selecting a spanning tree (and risks of a non-unique
representation). This merely requires converting foreign keys within the
spanning tree as well as the ones outside of it to/from XML IDREF data items.
With a
tree-structured pr_dump option (low-risk, TBD) any .dat file can be mapped directly
into XML trees. Generating the code for these 2-way converters for any schema.msdat file is an extension of
chgen/gencpp. (XML's IDREF types are
used for the residual foreign keys outside of the spanning tree in what is in
general a network of relationships.)
Again, this
is a well-understood problem, hence low-risk and deferrable; it is not even on
the direct path toward persistent data exchange (OMG/XMI), because tables (and
messages) can be converted to/from XML. We don't need to first select a
spanning tree (and risk its non-uniqueness). We merely convert all foreign keys
within the database to/from XML IDREF data items not just the ones outside of the spanning tree.
[See $PH/DataModels05fr1.ppt
or $PH/DataModels05fr1.mht
slides
on
bde2sch conversion and on pfkey compression]
---------------------------------
3. The real high-risk factors in
COOL
--------------------------------------------------
Robert
Almonte (04s522, 05f523) certainly identified lots of risks to COOL system
development, particularly with documentation and the learning curve for
code-generation tool maintainers, not to mention the problem of re-integrating
genv13 C-code generator enhancements with more advanced C++ code generator gencpp.
The same
problems arise in COOL-LCP, which models behavior using state diagrams for
event-driven control flow. This is much
more application-sensitive and API-oriented: The LCP interpreter has few
platform-porting problems, until we
attempt distributed concurrent processing.
I
believe LCP's high-risk factors are distributed event communication channels
and surrogates, namespace problems and the absence of shared memory, avoidance
of race conditions during event dispatching, and security compromises for
collaboration.
LCP has
an event message encoding problem that is analogous to chgen schema format: a
choice of encodings for event messages. For these, Almonte again correctly
identifies XML as a good candidate. But again, conversion of EventInstance
message format is a low-risk problem, with a first cut being the same
schema-driven converter that any other schema table can use. The important
thing about these messages is that they use the same data modeling protocol as
GEN data and LCP state models.
Distributed
setGame project refs: http://www.setgame.com
and $PH/05f523/04fdsg_raReportR05f.doc