RJLRef: $PH/COOL-BDE/RJLFdbk2JTan050926.doc Message 7/993 From Jing Tan Sep 26, 05 04:58:10 PM +0000 To: lechner@cs.uml.edu Cc: jtan@cs.uml.edu Subject: RE: bde2java04fStatus050924.txt and project proposals Hi, Prof Lechner, In order to merge genjava DB with bde2java to become bdegenjava, I have another idea which currently most companies are using. [RJL> These brackets enclose my comments on Jing Tan's proposal. My comments assume that ALL 05f523 students are by now familiar with bde's User Guide (esp. its data models) as well as chgen's User Manual, most of which is applicable to gencpp. These references WILL be important to the hour exam next week Oct. 4. This exam is open-book, so bring them along. Remember what was said by Gerry Weinberg (Author of "Psychology of Computer Programmers" and "Are Your Lights On?" : "If you can't think of at least three questions to ask about a problem, then you don't understand the problem.". What do you mean by 'select which program to use'? (what program, what use, what legacy code?) Is this conclusion based at comparing bde2java to bde/src code? Or are you comparing their generated output code? Or are you comparing genjava to gencpp and chgen internals? We already generate code for 3 target languages by feeding the same .sch file to genjava or gencpp or chgen. Genjava was based on gencpp which extended chgen, so I agree they could be merged rather easily.) Q1: "to generate java/cc/c code " seems to refer to rebuilding pr_util/pr*.c or its gencpp/genjava equivalent, after a schema.sch data model change from a client. I believe that the same code for any target language code can be generated by any of our current front-end schema parsers or .msdat file loaders. Chgen13 is the most stable, complete (with log/replay) and well-documented one. I do not think this is a reason for java interactivity at all, because data models are pretty stable, and top-down extensions (to new components or subclasses) are relatively easy. I think this is an off-line build process, and multi-version build scripts using make are relatively easy. Bde's 94sbdeschema.sch hasn't changed for 10 yrs (except for side project branches, where I did not try hard enough to make chgen upgrades catch up to match desired pr_util changes.) The need for graphic input is reduced by the existence of a text-based work-around for each: .sch file input to chgen/gen* for EERD's, textual declarations to generate STD database content in LCP for STDs. The real reason for interactivity is to support distributed collaborative use of COOL to design and test new software prototypes. When design capture IS the application, this means saving (a version tree of) design data files ('application' design and test data). So the client wants to send volatile design .dat or .bde files (and eventually state action method code changes), not stable EERD's or STDs, up and down the client/server pipeline. So in this respect, your words are correct: 'pass the graphic data to server'. Currently bde2java uses ftp/winscp for this, since a local source file cannot be edited by a local client then saved on a remote server, or vice versa. Source and target files must BOTH be at the client, or (with less security?) both be at the server. I conclude that you propose to upgrade the server side functionality of bde2java to target multiple target languages. Not a bad idea; no doubt this code can eventually become an auto-generatable, if not invariant, part of the code output from gencpp or genjava.) But the critical point is that code generation by gencpp/genjava/chgen is a pre-compile-time function. It will generate source (.java/.cc) code, and needs compilation to .class/.object code in advance from application schema. After compilation I expect to obtain orders of magnitude payoff from repetitive pr_util code re-use and reduced debug effort. Code generation from schema is a large-scale long-duration infrequent transaction; Code re-use AFTER compilation is a frequent interactive process (typically for prototype testing) and I can imagine many distributed multi-client interactive scenarios for it. Why not just transport the .msdat metadata equivalent? (From genxml/CGeggis) What answers do you propose to put in the XML format that aren't present in the current schema.sch format? After you read the chgen User Manual(v8 or 9 or 10log) the right questions will change to "what info must be present in the .sch file", not how that info should be represented. And pairing every field value with its name in each .dat file table row instance is NOT the right way to visualize multi-rowed RDB data tables for test I/O purposes. Pretty-printing of column headers would be more effective in my opinion. These could be derived from meta-data field TTcurr->TAcurr->fieldname which could be inherited as static class data members. Extending views from subsets of tables to subsets of columns (fields) for human consumption would be another useful gen* extension. In conclusion, I agree that augmenting the SV-->TT-->TA metadata model with metadata containing more formal constraints would have long-range value. But I'd rather defer this generalization of GEN: These constraints include value range expressions and domain-specific tables to define discrete sets and intervals of allowable data field values. CDIF and some UML components did include such constraint data. I also believe that *gen* should eventually include code to auto-generate parsers that validate data field values. As a short-range added-value project, I agree that *gen* extensions to augment the descriptive text field of Table TA (and TT), e.g. by allowing multi-line comment fields, would have immediate value as documentation for humans. I would capture this data by extending the meta-schema SV-->TT-->TA to include the XB Text Block extension from the bdesym project. This way more automation could be applied like a symbol table and a report generator for cross-references and spelling checks. Multi-line comments in TA-rows (in the last field of TA) could describe the declared attribute's meaning and/or value range). This could also solve similar problems in pr_util text management applications as well. Bdesym handled this by extending the fscanf parser in pr_load.c. If a TA-row that describes a scalar data item needs a paragraph or page of explanatory text, this text can be captured as a multi-line paragraph or a multi-K string of tokens as a child-set of TA. This extends the meta-schema EERD just like the HA-child-set of HN or the GX-child-set of CG extends BDE's EERD model for diagram capture. So meta-data and data converge toward each other. Don't forget that textual definition versios also need to be saved in a line-oriented version-control system. No one to my knowledge has proposed incremental difference storage algorithms for XML-formatted text under line-oriented version control systems. This performance would be a problem for Distributed BDE, when two users edit the same diagram concurrently (and, we hope, safely, by enforcing CVS-style optimistic concurrency)and using VOIP to enhance other-user-awareness and stay out of each others' way). My answer above was NO, for what I think is JingTan's purpose. I believe you are still thinking about database field-value-changing transactions, not long-duration diagram-editing transactions with complex graphics. However, I am not opposed to redesigning DBDE's client/server interface that synchronized a replicated database by uploading logging transactions (short and numerous) by which the master database tracked updates after some latency, and then rebroadcast its 'official' changes back to all clients to re-synchronize them, while allowing concurrent edits to proceed :-). I am not sure what you mean here - I still need a definition of WHAT goes into each layer, to understand what you mean. [Ref: DBDE/Telecomm paper?] In any case, I think you should concentrate on client/server interactions after *gen* code is generated and the application (e.g. DBDE) is built and running. That's where the interesting problems begin: modest volumes of short transactions from relatively few concurrent users, when performance bottlenecks are visible to clients using online graphics where fast visual feedback is needed locally but rollback is occasionally required due to conflicts among concurrent users. And thank YOU - for a stimulating set of new ideas.]