extract_dd

extract_dd extracts "embedded" detailed designs from carefully formatted C and C++ source files, allowing automatic generation of detailed design documents.

Index: Concepts, Enhancements, Function prolog, Header file, Implementation file, Input format, Keywords, Markers, Ouput format, Using.

Using
extract_dd
Usage: extract_dd <file> [...]
extract_dd will read input from a list of files specified on the command line. Note that the order of the files is significant. Files encountered first will be processed first, and produce output before files read later. This is particularly significant for header and implementation file pairs (see the description of the header/implementation relationship below). It writes to standard output, which may of course be redirected or piped as required.

Embedded
Detailed
Designs
Detailed designs (DDs) contain important information for developers and maintainers. However, all to often the DD is not kept up to date as code is modified. Eventually the DD document becomes so far out of step with the code that they become essentially useless. Embedded detailed designs are a refinement of "self-documenting" code, and are intended to allow maintaining the DD and the code in a single file. This way, when the code is modified it should be a simple process to update the DD at the same time, keeping the two in synch. Then, using a utility, the embedded DD can be extracted to satisfy documentation and milestone requirements.

Extracted
DD Format
The extracted DD will be divided by module. Output for a module will be a module-wide section, and then one function-wide section for each function.
The module-wide section will include:

Module header
Description
Dependencies
External interface declarations

The section for each function will include:

Function prototype
Inputs
Outputs
Description
Logic flow

Information to populate these entries of course be taken from the input files (see the description of the input below).
The formatting of the output will be rudimentary at best. extract_dd is not intended to produce finished documents, and it is expected that the output will be fed to a text processing program, possibly through additional filtering utilities, to produce the "real" documents. At sometime in the future extract_dd will be enhanced to optionally include HTML tags in the output. This will allow embedded DDs to be converted directly into web pages for use with web browsers.

Input
Format
extract_dd believes that source code is broken into modules, each of which are normally composed of a header file and an implementation file (often refered to as .h and .c files, although other suffixes are possible), although it recognizes that some modules may have only an implementation file. extract_dd expects both the header and implementation files to conform to a standard, including standard file and function prologs. It also expects that when both files are supplied, the header file will appear first on the command line.
Blank lines and lines containing only comment markers will be ignored. Leading "*" characters will be removed from input lines
The following are the expected layouts for the header and implementation files. There are links from significant keywords to explainations of how they are used, and where the resulting info is put in the output. Note that these standard layouts are described more fully in the OAO C/C++ Style Guides. Only the parts important to the operation of extract_dd are addressed here.

Header File Layout

The header file is intended to define all the external interfaces for the module. It will be included by any code which wishes to use the module, and by the implementation file. This is taken from the file apigen/code/include/example.h:

#ifndef MODULE_H #define MODULE_H /* -------------------------------------------------------------------------- * <boilerplate deleted> * * **************************************************************************** Unit: API Library - Runtime Common Module name: module.h Description: what this class or module does Development environment: Independent Modification history: <boilerplate deleted> Notes: To do: **************************************************************************** */ static char MODULE_H_W [] = "@(#) MODULE.h version"; // id for what // system includes #include <example_sytem_file.h> // local includes #include "example_local_file.h" // Constants // typedefs and Enum definitions // class declarations // function prototypes // global data definitions // macro definitions #endif // MODULE_H

Implementation File Layout

The implementation file defines the routines which were declared in the header file, as well as declaring and defining private types, data, and routines. The standard file prologs for the header and implementation files are very similar. In cases where both the header and implementation files define the same information, the info from the header file is used. This is taken from the file apigen/code/src/example.c:

/* -------------------------------------------------------------------------- * <boilerplate deleted> * * **************************************************************************** Unit: API Library - Runtime Common Module name: module.c Description: what this class or module does Development environment: Independent Modification history: <boilerplate deleted> Notes: To do: **************************************************************************** */ static char MODULE_H_W [] = "@(#) MODULE.c version"; // id for what // system includes #include <example_sytem_file.h> // local includes #include "example_local_file.h" // convenience type definitions // convenience macro definitions // class definitions // function definitions

Function Prolog

Each function defined in the implementation file should include a function prolog. Note that the function prolog is moderately different in the case of "regular" functions (not associated with a C++ class) and member functions. The example file for regular functions is in /apigen/code/src/function_template.txt. This definition is for a member function, source in /apigen/code/src/member_function_template.txt:

return_type class_name :: member_name(arg1type arg1, //() arg2type arg2,...) //() /*----------------------------------------------------------------- * * Description: * Overall description of the function. * * Inputs: * input_name: description of why it's needed. * * Outputs: * output_name: escription of why it's updated. * * Algorithm: * Name of algorithm and references. May be blank. * * Notes: * whatever you need * *----------------------------------------------------------------- */ { //__ logic flow goes here } // member_name

Keywords

Keywords are specific words, phrases, or character strings which extract_dd recognizes as meaning that information should be (or should no longer be) collected for inclusion into the extracted DD.

Keyword index: #ifndef, Algo, Descr, Devel env, #include, Inputs, markers, Module name, Notes, Outputs, Prototype, separators, Unit

#ifndef #ifndef <module name>_h is expected to be the first line of the header file. If it isn't, extract_dd will get confused about whether it's processing a header file or an implementation file.

*****
----, ... Most lines containing more than five asterisk (*) or hyphen (-) characters are regarded as section separators. They end any block currently being collected (for example, see Description: below), and are not put into the section.

Unit: The Unit: line identifies which unit (sometimes called subsystem) this module is part of. It is put into the module header. The "Unit:" is removed from the line, along with leading white space, and then the line is added to the section.

Module name: The Module name: line identifies which module is being declared or defined. It is put into the module header. The "Module name:" is removed from the line, along with leading white space, and then the line is added to the section.

Description: The Description: keyword starts collecting a block of info. The "Description:" keyword is removed, then the line is added to the block. Lines are added to the block until the next keyword or separator is found. The information collected is put into the module header if it is part of the file prolog, or into the function description otherwise.

Development
Environment: The Development environment: line identifies which module is being declared or defined. It is put into the module header. The "Development environment:" is removed from the line, along with leading white space, and then the line is added to the section.

Notes: The Notes: keyword starts collecting a block of info. The "Notes:" keyword is removed, then the line is added to the block. Lines are added to the block until the next keyword or separator is found. The information collected is put into the module description if it is part of the file prolog, or into the function description otherwise.

#include Each #include line has the file name extracted from within the "" or <> and added to the dependencies section. extract_ddassumes that all the #includes come before any function prototypes. Any which come after may end up in the wrong place.

prototype Function prototypes must be marked with the interface marker to be correctly recognized and stored. If the prototype extends across multiple source lines, each line must be marked. Prototypes (of course) go in the function prototype output section.

Inputs: The Inputs: keyword starts collecting a block of info (the "Inputs:" keyword is retained). Lines are added to the block until the next keyword or separator is found. The information collected is put into the inputs description for the current function.

Outputs: The Outputs: keyword starts collecting a block of info (the "Outputs:" keyword is retained). Lines are added to the block until the next keyword or separator is found. The information collected is put into the Outputs description for the current function.

Algorithm: The Algorithm: keyword starts collecting a block of info. The "Algorithm:" keyword is removed, then the line is added to the block. Lines are added to the block until the next keyword or separator is found. The information collected is put into the module header if it is part of the file prolog, or into the function description otherwise.

Markers

"Markers" are specially formatted comments which indicate that a source line of code is a part of the embedded detailed design, and should be included in a particular section of the output. Currently there are two markers defined, the interface maker and the logic flow marker.
Markers may be placed before any other text on the line, in the middle of any text, or after any text. Since markers are comments, and the text they mark is usally "real" code, this is significant. Any text to the right of the marker will be commented out, and never seen by the compiler.
In all cases the marker itself will be deleted, but no other change will be made to the line. In particular, white space will be preserved. This means that indentation can be used to indicate nesting in the declaration or logic flow being marked.
The interface marker is either /*() for files which must be processible by the "regular" C compiler (and lines marked this way must have a matching */), or //() for lines which will be processed by the C++ compiler.
Interface markers which are found in header files indicate that the particular line should be added to the externals portion of the module description. Interface markers found in implementation files mark function declarations. Any module or routine information saved up to this point will be output, and info will now be collected for a new function.
The logic flow marker is either /*__ for files which must be processible by the "regular" C compiler (and lines marked this way must have a matching */), or //__ for lines which will be processed by the C++ compiler. Logic flow markers always indicate that the particular line should be added to the logic flow portion of the current function description.

Future
Enhance-
ments
At some point in the future, a command-line option will allow users to request that extract_dd include HTML tags in its output so embedded DDs can be converted directly into web pages for use with web browsers. Additional languages can be supported simply by recognizing markers with the particular language's type of comments.
At this time, no other enhancements are planned. However, this does not mean that no enhancements are possible. extract_dd is written in nawk (or gawk, depending on which one is installed on your system), so it should be fairly easy to enhance. Recognizing new keywords, re-ordering the order of the output, or adding new markers are all possiblities. Additional markers (such as markers to turn on and off change bars in the extracted DD, or to mark the begining and end of a block of text to go in the current section of the output) could be defined if needed. Please contact Jon Stewart-Taylor for more info.