Subject: COOL Framework for MDD, and SWEng risk-aversion

CS Dept mini-seminar Wed Sep 21 

RJLRef: $PH/05f523/RJLtalk2CS050921.htm [,.txt]

 

This is what I would like to cover in my 10-min talk at the CS seminar on faculty research areas (on Wed 9/21 3-4PM in OS311).

But alas, time is too short :-(

 

I teach project-oriented OOAD/SWEng courses (91.522/523/524?).

These all involve an ongoing legacy code project called the COOL Framework for what is now called Model Driven Development, or MDD.

 

[If this sound like a sales pitch for COOL, it is: There is plenty of applied research as well as development to do on MDD tools and their integration.]

 

MDD is the latest buzzword for computer-based aids for software development. OMG's standard UML2.0 (Unified Modeling Language) calls  it  MDArchitecture (MDA); the Open Group generalized it to MDD to avoid some OMG constraints.  MicroSoft calls it Software Factories using Domain-Specific-Languge (DSL).

 

The goal of MDD is code generation from static data models (EERD or UML Class Diagrams) and from dynamic behavior models (Control-State Transition diagrams). Our COOL has three components GEN, LCP and BDE. All three components are layered together and make use of one another. My goal is to use them as a test-bed for MDD concepts by practicing what I preach (i.e., use COOL tools to evolve COOL itself).

 

The skills that go into MDD include information modeling (as in relational database design and OOP Design Patterns), data-driven applications (as in lexical analyzers and compilers), and of course O-O Programming (as in C++ or Java). Also threads and concurrent processing for distributed applications.

 

-----------------------------------------------------


 

2. Goals of COOL Framework for MDD

 

The goal of MDD is automatic code generation  from model-based software designs. Commercial developers  are used to generating access code from models for static data structures such as Entity-Relationship Diagrams  for relational databases. 

 

Embedded system developers  are familiar with state machine models for dynamic behavior. They have been using state-machine models for years (e.g., iLogix, Kennedy-Carter,  PathFinder Soluions, and Bridgeport/ProjectTechnology (now Accelerated Technology, part of Mentor Graphics).

 

COOL is my own  3-component architecture for Model-Driven-Development (MDD).  The Applied Math  side of my brain loves the abstract modeling aspect  of MDD;  my engineering side wants to build things, and experience makes me believe that the real problems (and pitfalls) of MDD can only be appreciated by actually implementing COOL as a prototypical version of MDD, and applying it to non-trivial applications.

 

What could be more non-trivial as applications than the components of COOL itself?  That question can only be answered by applying COOL to real-world applications, after extending COOL to allow collaborative development  by a team of distributed and dispersed software engineers (simulated by a network of students :-)


 

Design-time  Advantages of COOL Framework

 

COOL components can turn a graphic design into an effective prototype simulation when augmented by writing low-level application-specific state action routines:

·       COOL can support hyper-linked browsing through multiple levels of the design.

·       COOL development targets support for database-change-logging and replay to

o      trace execution at the state model level (in LCP-based prototypes)

o      replay the log backward for repeated Undo (e.g. in the BDE editor).

o      do regression testing (event-driven and otherwise unreproducible)

o      measure test coverage at the flow control (STD) level.

·       COOL research targets distributed system development and execution via

o      synchronous collaborative data exchange

o      database replication at distributed sites

o      on-line synchronous interaction among multiple distributed clients

-------------------------------------

 

LCP State Model Interpreter

 

LCP is an interpreter and simulator for State Transition Diagrams. Like flowcharts, these can be nested in a hierarchy and (as every compiler student knows) can be called recursively. help visiualize the control branching that is often hidden by deferred testing of flag variables.

(As a first cut, Boolean flags are replaced by state code bits. Then simplify the resulting STD.)

 

However, state models are most useful to represent cooperative multi-processing or threads, I.e., they can describe  control flow at any level of granularity:

·       At a coarse level they can model distributed systems whose states wait for client/server requests with unpredictable order and time-of-arrival.

·       At a fine level they can show conditional and iterative execution of basic blocks of non-branching code which compilers love to identify as they parse your source code.  (Martin in the UK has reverse-engineered 60K lines of assembly code into such models.)

 

·       Simulating State Models is at least as old as Harel's StateCharts and his (expensive!) iLogix' StateMate tool. The challenge is to integrate GEN and LCP with a GUI in a computer-aided design environment (framework).

 

 

-------------------------------------------

Version Control for the Design Database

 

ALL three COOL components are driven by design data that can be stored in any vendor's RDB. This alone might justify using XML as an intermediate format. (Currently I have a hard time curbing student enthusiasm for this :-)

 

However, for highly data-driven software prototyping applications, it makes more sense to use a source code version control system like CVS/RCS for external storage. The rationale here is that design data defines runtime data structures from which source code is either automatically generated by GEN, or interpreted by LCP.

 

Regardless of how this design data is captured, it must evolve through a branching tree of versions just like the source code itself. Hence COOL can use a version control system (CVS) to maintain its design database and not just source code. Efficient version control depends on saving incremental differences; GEN's output code manages a persistent relational database whose normalized flat tables store ASCII-formatted and line-oriented data that is CVS-compatible.

 


 

COOL component integration:

 

The GEN component produces code to manipulate data structures, such as field get and set operators, iterators over data sets, and persistent storage and retrieval. It can customize this code to arbitrary database models, which the modeler first normalizes in graphics or text form. Generating database manipulation code from data models is state-of-the art today; it is no longer a novelty.  The challenge is to integrate both data and state models into a system for both design capture for a prototype and runtime evaluation.

 

Smooth inter-operation of COOL components has been a goal from day one:

 

·       GEN depends on a common data model that it shares with LCP and BDE. (GEN even bootstraps itself from an earlier version, to internalizes metadata and use it to produce data management code.

 

·       LCP and BDE both depend on GEN to produce code for their internal workings.

 

·       BDE can depend on LCP to execute its control flow (we currently hide this dependency in comments). BDE can also be used as an output display for suitable application prototypes.

 

·       The above dependencies are one-way, not circular. BDE closes the loop offline, by capturing design data, for GEN to convert to compile-able code, and for LCP to bind and interpret.

 

·       BDE is designed to take advantage of LCP state models although BDE does not yet use LCP. LCP can use BDE as a GUI (display-only mode currently). Input metadata for GEN and LCP can be derived from BDE diagrams, although both currently use textual definition work-arounds.

 

 

GEN and LCP collaboration:

 

·       The GEN component produces code to manipulate data structures, such as field get and set operators, iterators over data sets, and persistent storage and retrieval. It can customize this code to arbitrary database models, which the modeler first normalizes in graphics or text form. Generating database manipulation code from data models is state-of-the art today; it is no longer a novelty.

·        Code to manipulate data structures, such as field get and set operators, iterators over data sets, and persistent storage and retrieval, is generated by the GEN component.

 

·       Versions of GEN exist in various stages of completion can generate source code for C, C++ and Java. GEN also provides logging and replay of persistent data changes as they occur inside the application prototype.

 

·       The design data that LCP interprets represents (and can be derived from) State Transition Diagrams. LCP also functions as an event queue manager and dispatcher, to simulate a prototype for a possibly distributed concurrent  application.

 

·       In this way LCP controls the runtime behavior of processes, and provides run-time visibility at the level of state transitions and lower-level function calls.

 

---------------------------------------------------------

 

Block Diagram Editor BDE

 

The third COOL component is a Block Diagram Editor BDE. The value of BDE is its ability to present the design as data models and state models in a two-dimensional form, as distinct styles of  directed graphs or block diagrams. BDE can create and edit multiple graphic diagram styles for inclusion in the design database:

 

o      Extended Entity-Relation  data models and/or UML class diagrams,

o      Dynamic behavior models (State Transition Diagrams).

o      Other diagram types can be generated but our focus has been on exploiting the semantics of  EERD's to generate code that manipulates data, and STD's to define the rules for control states and program sequencing.

o      BDE can traverse the equivalent of hyperlinks up and down multiple levels of abstraction.

 

Example: see the Juice Plant Demo at http://www.cs.uml.edu/~lechner/JP2html/

User Guide for BDE:      

          ../COOL-BDE/BDEUserGuide2005/bdeUG_2005.htm, bdeUG_2005.ppt

 

-----------------------------------------------

 

Tradeoffs among levels-of-abstraction :

 

Graphic data models must always trade off concrete completeness against big-picture context (info-hiding abstraction).

o      The big picture requires visualization of inter-class-level and object-level interactions (event and message flows).

 

o      Completeness requires attention at a fine-grain level of detail, and is best left to textual definitions of declarative and operational programs.

 

o      Complex systems require partitioning (separation of concerns) and decomposition (step-wise refinement). Complex data models (entity classes and associations) can be partitioned into sub-schema.

 

o      Nodes representing such sub-schema can be hyperlinked to lower-level ER model or subschema diagrams.

 

o      Entity class nodes on data models can include class method names that are hyperlinked to nested state diagrams that model these methods.

 

o      State models can be decomposed into smaller (and faster) sub-models or to leaf states that contain or hyperlink to source code to define their actions. The LCP interpreter calls the corresponding compiled routines.

 

o      Example: see the Juice Plant Demo at http://www.cs.uml.edu/~lechner/JP2html/

 

-----------------------------------------------


 

Student projects and the COOL Framework

 

In encouraging student projects, I often get highly motivated students who propose to migrate an existing legacy code base to the latest band-wagon technology.

 

I hate to discourage such highly motivated  students but my advice is to first find, and then concentrate on addressing, factors which have high risk compared to value  in such projects, in order to discover future problems as early as possible and do what is feasible within time and budget.

 

For example, XML formatting is a low-risk factor.  Defining XML schema input to chgen/gencpp is perhaps the lowest-risk factor because I know  how to do it. This is because  our database is a set of flat relational tables, and these tables can be converted to/from XML without first selecting a spanning tree (and risks of a non-unique representation). This merely requires converting foreign keys within the spanning tree as well as the ones outside of it to/from XML IDREF data items.

 

With a tree-structured pr_dump option (low-risk, TBD) any .dat file can be mapped directly into XML trees. Generating the code for these 2-way converters  for any schema.msdat file is an extension of chgen/gencpp. (XML's IDREF  types are used for the residual foreign keys outside of the spanning tree in what is in general a network of relationships.)

 

Again, this is a well-understood problem, hence low-risk and deferrable; it is not even on the direct path toward persistent data exchange (OMG/XMI), because tables (and messages) can be converted to/from XML. We don't need to first select a spanning tree (and risk its non-uniqueness). We merely convert all foreign keys within the database to/from XML IDREF data items not just  the ones outside of the spanning tree.

 

[See $PH/DataModels05fr1.ppt or $PH/DataModels05fr1.mht slides

on bde2sch conversion and on pfkey compression]

 

 

---------------------------------

 

 

 

3. The real high-risk factors in COOL

--------------------------------------------------

 

Robert Almonte (04s522, 05f523) certainly identified lots of risks to COOL system development, particularly with documentation and the learning curve for code-generation tool maintainers, not to mention the problem of re-integrating genv13 C-code generator enhancements with more advanced C++ code generator  gencpp.

 

The same problems arise in COOL-LCP, which models behavior using state diagrams for event-driven  control flow. This is much more application-sensitive and API-oriented: The LCP interpreter has few platform-porting problems,  until we attempt distributed concurrent processing.

 

I believe LCP's high-risk factors are distributed event communication channels and surrogates, namespace problems and the absence of shared memory, avoidance of race conditions during event dispatching, and security compromises for collaboration.

 

LCP has an event message encoding problem that is analogous to chgen schema format: a choice of encodings for event messages. For these, Almonte again correctly identifies XML as a good candidate. But again, conversion of EventInstance message format is a low-risk problem, with a first cut being the same schema-driven converter that any other schema table can use. The important thing about these messages is that they use the same data modeling protocol as GEN data and LCP state models.

 

Distributed setGame project refs: http://www.setgame.com

and $PH/05f523/04fdsg_raReportR05f.doc