Documentation generated from fossil trunk
docstrip -
Docstrip style source code extraction
package require Tcl 8.4 package require docstrip ? 1.2 ? docstrip::extract text terminals ? option value ... ? docstrip::sourcefrom filename terminals ? option value ... ?
In short, the basic principle of literate programming is that program source should primarily be written and structured to suit the developers (and advanced users who want to peek "under the hood"), not to suit the whims of a compiler or corresponding source code consumer. This means literate sources often need some kind of "translation" to an illiterate form that dumb software can understand. The docstrip Tcl package handles this translation.
Even for those who do not whole-hartedly subscribe to the philosophy behind literate programming, docstrip can bring greater clarity to in particular:
The way it works is that the programmer edits directly only one or several "master" source code files, from which docstrip generates the more traditional "source" files compilers or the like would expect. The master sources typically contain a large amount of documentation of the code, sometimes even in places where the code consumers would not allow any comments. The etymology of "docstrip" is that this documentation was stripped away (although "code extraction" might be a better description, as it has always been a matter of copying selected pieces of the master source rather than deleting text from it). The docstrip Tcl package contains a reimplementation of the basic extraction functionality from the docstrip program, and thus makes it possible for a Tcl interpreter to read and interpret the master source files directly.
Readers who are not previously familiar with docstrip but want to know more about it may consult the following sources.
docstrip::extract [join { {% comment} {% more comment !"#$%&/(} {some command} { % blah $blah "Not a comment."} {% abc; this is comment} {# def; this is code} {ghi} {% jkl} } \n] {}returns the same sequence of lines as
join { {some command} { % blah $blah "Not a comment."} {# def; this is code} {ghi} "" } \nIt does not matter to docstrip what format is used for the documentation in the comment lines, but in order to do better than plain text comments, one typically uses some markup language. Most commonly LaTeX is used, as that is a very established standard and also provides the best support for mathematical formulae, but the docstrip::util package also gives some support for doctools-like markup.
Besides the basic code and comment lines, there are also guard lines, which begin with the two characters '%<', and meta-comment lines, which begin with the two characters '%%'. Within guard lines there is furthermore the distinction between verbatim guard lines, which begin with '%<<', and ordinary guard lines, where the '%<' is not followed by another '<'. The last category is by far the most common.
Ordinary guard lines conditions extraction of the code line(s) they guard by the value of a boolean expression; the guarded block of code lines will only be included if the expression evaluates to true. The syntax of an ordinary guard line is one of
'%' '<' STARSLASH EXPRESSION '>' '%' '<' PLUSMINUS EXPRESSION '>' CODEwhere
STARSLASH ::= '*' | '/' PLUSMINUS ::= | '+' | '-' EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION | SECONDARY '|' EXPRESSION SECONDARY ::= PRIMARY | PRIMARY '&' SECONDARY PRIMARY ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')' CODE ::= { any character except end-of-line }Comma and vertical bar both denote 'or'. Ampersand denotes 'and'. Exclamation mark denotes 'not'. A TERMINAL can be any nonempty string of characters not containing '>', '&', '|', comma, '(', or ')', although the docstrip manual is a bit restrictive and only guarantees proper operation for strings of letters (although even the LaTeX core sources make heavy use also of digits in TERMINALs). The second argument of docstrip::extract is the list of those TERMINALs that should count as having the value 'true'; all other TERMINALs count as being 'false' when guard expressions are evaluated.
In the case of a '%<*EXPRESSION>' guard, the lines guarded are all lines up to the next '%</EXPRESSION>' guard with the same EXPRESSION (compared as strings). The blocks of code delimited by such '*' and '/' guard lines must be properly nested.
set text [join { {begin} {%<*foo>} {1} {%<*bar>} {2} {%</bar>} {%<*!bar>} {3} {%</!bar>} {4} {%</foo>} {5} {%<*bar>} {6} {%</bar>} {end} } \n] set res [docstrip::extract $text foo] append res [docstrip::extract $text {foo bar}] append res [docstrip::extract $text bar]sets $res to the result of
join { {begin} {1} {3} {4} {5} {end} {begin} {1} {2} {4} {5} {6} {end} {begin} {5} {6} {end} "" } \nIn guard lines without a '*', '/', '+', or '-' modifier after the '%<', the guard applies only to the CODE following the '>' on that single line. A '+' modifier is equivalent to no modifier. A '-' modifier is like the case with no modifier, but the expression is implicitly negated, i.e., the CODE of a '%<-' guard line is only included if the expression evaluates to false.
Metacomment lines are "comment lines which should not be stripped away", but be extracted like code lines; these are sometimes used for copyright notices and similar material. The '%%' prefix is however not kept, but substituted by the current -metaprefix, which is customarily set to some "comment until end of line" character (or character sequence) of the language of the code being extracted.
set text [join { {begin} {%<foo> foo} {%<+foo>plusfoo} {%<-foo>minusfoo} {middle} {%% some metacomment} {%<*foo>} {%%another metacomment} {%</foo>} {end} } \n] set res [docstrip::extract $text foo -metaprefix {# }] append res [docstrip::extract $text bar -metaprefix {#}]sets $res to the result of
join { {begin} { foo} {plusfoo} {middle} {# some metacomment} {# another metacomment} {end} {begin} {minusfoo} {middle} {# some metacomment} {end} "" } \nVerbatim guards can be used to force code line interpretation of a block of lines even if some of them happen to look like any other type of lines to docstrip. A verbatim guard has the form '%<<END-TAG' and the verbatim block is terminated by the first line that is exactly '%END-TAG'.
set text [join { {begin} {%<*myblock>} {some stupid()} { #computer<program>} {%<<QQQ-98765} {% These three lines are copied verbatim (including percents} {%% even if -metaprefix is something different than %%).} {%</myblock>} {%QQQ-98765} { using*strange@programming<language>} {%</myblock>} {end} } \n] set res [docstrip::extract $text myblock -metaprefix {# }] append res [docstrip::extract $text {}]sets $res to the result of
join { {begin} {some stupid()} { #computer<program>} {% These three lines are copied verbatim (including percents} {%% even if -metaprefix is something different than %%).} {%</myblock>} { using*strange@programming<language>} {end} {begin} {end} "" } \nThe processing of verbatim guards takes place also inside blocks of lines which due to some outer block guard will not be copied.
The final piece of docstrip syntax is that extraction stops at a line that is exactly "\endinput"; this is often used to avoid copying random whitespace at the end of a file. In the unlikely case that one wants such a code line, one can protect it with a verbatim guard.
For a document format that does not require any non-Tcl software, see the ddt2man command in the docstrip::util package. It is suggested that files employing that document format are given the suffix .ddt, to distinguish them from the more traditional LaTeX-based .dtx files.
Master source files with .dtx extension are usually set up so that they can be typeset directly by latex without any support from other files. This is achieved by beginning the file with the lines
% \iffalse %<*driver> \documentclass{tclldoc} \begin{document} \DocInput{filename.dtx} \end{document} %</driver> % \fior some variation thereof. The trick is that the file gets read twice. With normal LaTeX reading rules, the first two lines are comments and therefore ignored. The third line is the document preamble, the fourth line begins the document body, and the sixth line ends the document, so LaTeX stops there ? non-comments below that point in the file are never subjected to the normal LaTeX reading rules. Before that, however, the \DocInput command on the fifth line is processed, and that does two things: it changes the interpretation of '%' from "comment" to "ignored", and it inputs the file specified in the argument (which is normally the name of the file the command is in). It is this second time that the file is being read that the comments and code in it are typeset.
The function of the \iffalse ... \fi is to skip lines two to seven on this second time through; this is similar to the "if 0 { ... }" idiom for block comments in Tcl code, and it is needed here because (amongst other things) the \documentclass command may only be executed once. The function of the <driver> guards is to prevent this short piece of LaTeX code from being extracted by docstrip. The total effect is that the file can function both as a LaTeX document and as a docstrip master source code file.
It is not necessary to use the tclldoc document class, but that does provide a number of features that are convenient for .dtx files containing Tcl code. More information on this matter can be found in the references above.
docstrip_util
documentation, source, literate programming, docstrip, LaTeX, .dtx