The DocGen Project


Automatic generation of on-line software documentation

[ Architecture | Publications ] ------

Aim: Pilot implementation of a system producing on-line software documentation from software specifications created with a CASE tool

Central Issues:

The Document Generation system (DocGen) produces hypertext documentation to be used by the managers, system analysts, designers and programmers, and eventually by the end-users. The software specifications are created with OBLOG (OBject-oriented LOGic) - an object-oriented CASE tool which was brought forth as a research idea in INESC, Lisbon and is currently under industrial development at OBLOG Software S.A., Portugal. Since OBLOG is a multiuser environment, actually we try to generate multiple views from the common software specifications. The resulting document set in HTML format can be browsed on-line with e.g. Netscape or converted to paper-based documentation (although the main advantages of the hypertext organisation will be lost).

Some of the expected advantages of NL generation techniques over the hand-written documentation are the following:


Architecture:

[ Interface with OBLOG | Content Planning | Text Organisation and Generation | Lexical Choice ]
Interface with OBLOG
Our pre-processing starts from an internal representation of OBLOG specifications encoded into a relational database. From there we select information, filter it, and feed it into a domain model where the specifications are represented as Prolog facts resembling Prolog encoding of AI knowledge representation formalisms. In this way the proper generation of documentation starts from the domain model which is an AI-oriented description extracted from the OO specifications.
Content Planning
The DocGen preselector decides which parts of the design specifications are to be documented for the selected user type and goal. Some of the heuristics for content selection are:
  • A natural source of user-oriented information is the application's interface, i.e. buttons, list boxes, text messages, etc. So the preselector starts by selecting all interface elements and later follows all calls and parameters related to them. We can also infer some information about the relevancy and purpose of the interface elements from their grouping;
  • Other important application characteristics are the states where the application waits for some condition to be met, e.g. an user reaction. Hence, all such situations from the behaviour diagrams are selected as meaningful, together with the transitions triggered by these states;
  • All persistent OBLOG objects (i.e. objects where data is stored) are called TaBLe-objects and are saved in relational databases. So all such objects and their attributes are included in the domain model as well.
Text Organisation
During the text organisation phase the selected content is organised in separate hypertext nodes, all hypertext links are identified and each node is also structured internally. The output of this stage is a set of structured documents planned down to a sentence level.
Schemata: The decisions for the formatting and inclusion of graphics and related material are already made. In order to ensure coherency we use document and section organisation schemata since the structure of a software documentation is relatively invariant. DocGen maintains different organisation schemata for different types of classes, as well as for indexes, cross-reference nodes and the main (starting) node, where an overview of the classes and their interactions is given.
Links: At present DocGen establishes links to classes, their attributes and methods.
Lexical choice
Since technical documentation is meant to be unambigous, DocGen should always map one concept to one term. This greatly simplifies the problem with the lexical choice, which is a crucial problem in other domains. At present DocGen uses the internal labels given by the programmer. It also integrates in the output file all comments present in the specifications, since they may help reveal the encoded semantics.


Publications:

K. Bontcheva and G. Angelova. Planning and Generating Hypertext Documentation. In: Proceedings of the Workshop "Gaps and Bridges in Natural Language Genaration" (W11), European Conference on Artificial Intelligence ECAI-96, Budapest, Hungary, August 1996, pp. 25 - 28.
gzipped postscript

------

Back to top...

Back Home...

Projects


Comments and problems

Last Updated by Kalina Bontcheva on 23 May, 1996.