The DocGen Project
Automatic generation of on-line software documentation
[
Architecture
|
Publications
]

Aim: Pilot implementation of a system producing on-line software documentation from software specifications created with a CASE tool
The Document Generation system (DocGen) produces hypertext
documentation to be used by
the managers, system analysts, designers and programmers, and
eventually
by the end-users. The software specifications are created with
OBLOG (OBject-oriented LOGic) - an
object-oriented CASE tool
which was brought forth as a research idea in INESC, Lisbon and is currently under industrial
development at OBLOG Software S.A., Portugal. Since OBLOG is a multiuser
environment, actually we try to generate multiple views from the
common software specifications. The resulting document set in
HTML format can be browsed
on-line with e.g. Netscape or converted to
paper-based documentation
(although the main advantages of the hypertext organisation will be
lost).
Some of the expected advantages of NL generation techniques over the
hand-written documentation are the following:
- Information contained in the application's technical
documentation (e.g. ref. guide) can
be generated automatically.
- The generated documentation will have an uniform layout
conforming to the company's requirements.
- The documentation generated at various stages of the
software design will form the history records of the
application almost without human effort. Additionally, the
documents' content
will always conform to the current design status.
- A new designer can easily understand the overall system
architecture and browse the existing classes
without having to find his way through the multiple
screens with graphics of the formal specifications.
- An up-to-date full reference documentation can
facilitate the reuse of available modules.
[
Interface with OBLOG
|
Content Planning
|
Text Organisation and Generation
|
Lexical Choice
]
- Interface with OBLOG
- Our pre-processing starts from an internal representation of OBLOG
specifications encoded into a relational
database. From there we select information, filter it, and feed it
into a
domain model where the specifications are represented as Prolog
facts resembling Prolog encoding of AI knowledge representation
formalisms. In this way the proper generation of documentation
starts from the domain model which is an AI-oriented
description extracted from the OO specifications.
- Content Planning
-
The DocGen preselector decides
which parts of the design specifications are to be documented for the
selected user type and goal. Some of the heuristics for content selection
are:
- A natural source of user-oriented information is the application's
interface, i.e. buttons, list boxes, text messages, etc.
So the preselector starts by selecting all
interface elements and later
follows all calls and parameters related to them. We can also infer
some information about the relevancy and purpose of the interface
elements from their grouping;
- Other important application characteristics
are the states
where the application waits for some condition to be met,
e.g. an user reaction. Hence, all such situations from the
behaviour diagrams are selected as meaningful,
together with the transitions triggered by these states;
- All persistent OBLOG objects (i.e. objects where data is stored)
are called TaBLe-objects and are saved in relational databases.
So all such objects and their attributes are included
in the domain model as well.
- Text Organisation
-
During the text
organisation phase the selected content is organised in
separate hypertext nodes, all hypertext links are identified and
each node is also structured internally. The output of this
stage is a set of structured documents planned down to a sentence
level.
Schemata: The decisions for the formatting and inclusion of graphics and
related material are already made. In order to ensure coherency we use
document and section organisation schemata since the structure of a
software documentation is relatively invariant. DocGen maintains different
organisation schemata for different types of classes, as well as for
indexes, cross-reference nodes and the main (starting) node, where an
overview of the classes and their interactions is given.
Links: At present DocGen
establishes links to classes, their attributes and methods.
- Lexical choice
- Since technical documentation is meant to be unambigous, DocGen should
always map one concept to one term. This greatly simplifies the problem with the lexical choice, which is a crucial problem in other domains. At present DocGen uses the internal labels given by the programmer. It also integrates in
the output file all comments present in the specifications, since they
may help reveal the encoded semantics.
K. Bontcheva and
G. Angelova. Planning and
Generating Hypertext Documentation. In: Proceedings of the Workshop
"Gaps and Bridges in Natural Language Genaration" (W11), European
Conference on Artificial Intelligence ECAI-96, Budapest, Hungary, August
1996, pp. 25 - 28.
gzipped postscript

Back to top...
Back Home...
Projects
Comments and problems
Last Updated by Kalina Bontcheva on 23 May, 1996.