Why HWEB?

Why am I writing the HWEB system of "semi-literate" programming? This rationale will probably make more sense if you've looked briefly at Donald Knuth's system for literate programming, called simply WEB.

WEB produces very nice documentation from the TeX part of the source code, and it has certainly proven its worth, in that Knuth managed to write at least two widely-used programs in it without many bugs at all (though I think that says more about Knuth than about his language!). Even a non-programmer can look at the output from weave and get a decent idea of how a well-written WEB program works, because the language provides some really nice ways to organize the programmer's ideas in a logical order, rather than the order in which the compiler wants them; and TeX is a pretty awesome typesetting engine (in both the literal and figurative senses).

So why don't I like WEB? Take a look at the actual source code of a WEB program. Sure, it compiles to a nice-looking PDF, but the code itself—the actual stuff with which the programmer has to deal—is pretty incomprehensible. All those at-signs and one-letter mnemonics are confusing, and the syntax rules of the WEB language—what controls can be placed in which sections, and when whitespace is significant, and so on—are very hard to follow.[1] (The TeX macro language has the same problems with ease-of-use. It's partly the "line-noise factor," as in Perl, but it's also that Knuth designed these languages to be used by Knuth, not by Joe Q. Hacker, who presumably hasn't spent ten years writing the language himself.)

Some people use WEB and like it. HWEB is probably not for them. The goal of HWEB is to take the philosophy and broad structure of WEB and make it accessible to aforementioned Joe Q. Hacker. First of all, and most drastically, I replace the TeX output by HTML. This makes the documentation truly portable to systems where TeX is uncommon (e.g., Microsoft Windows), and makes for one or two steps fewer in the modify-compile-view process. Yet HWEB retains the most often-used features of TeX, such as a distinction between "plain text mode" and "math mode." (In fact, HWEB, like TeX, has two math modes—an "inline" mode for snippets and an "out-of-line" mode for long formulas.)

I promise that HWEB will never use the at-sign for any purpose whatsoever. Because it's butt-ugly, and it scares the reader. My current plan is, if HWEB ever expands to doing the rearranging of source code that Knuth's system does, to simply use keywords like Begin and End to introduce sections (rather than WEB's ugly @1 and @ commands—note the significant whitespace in that latter one). The current version of hweav, the HWEB version of WEB's weave, simply parses any text inside C-style comment delimiters, and that's a nice easy option, though not nearly as useful for self-documentation purposes.

While we're on the subject of what HWEB doesn't do that it ought to: It doesn't produce indices, nor internal hyperlinks, nor (most importantly) any kind of sectioning information at all. Sectioning is important so that the reader is not overwhelmed with code all at once; he needs the landmarks in the output which WEB provides via "section" commands.)

HWEB's major goal is to make documentation painless. In WEB, documentation sections are often filled with \def and \.{icky brackets}, which is a turn-off to the source-diving Joe Q. In HWEB, there is no \def; the idea is that it should not be necessary. If the user wants to write in italics, he doesn't need to write

I say, {\it hello world!}

He simply types, as he might in an email conversation,

I say, /hello world!/

and the HWEB system takes care of the italicization for him. The HWEB system also silently takes care of a lot of mathematical notation, so that where a WEB user would write

$$\sum_{i=0}^{\inf} A_{ij}$$

the HWEB user would write

$
   \sum_{i=0}^\inf Aij
$

to produce

∞
∑ A_ij
i=0

(The same expression, in "inline" math mode, would be rendered as ∑_i=0..∞ A_ij.) Note the automatic subscripting performed by HWEB; this is particularly nice in the case of variables named x₁, for example.

Other features of HWEB include the automatic smallcapping of capitalized names; e.g. "Donald KNUTH" in the input yields "Donald KNUTH" in the output. HWEB follows TeX's lead in breaking paragraphs at blank lines and replacing sequences --- and -- with em and en dashes, respectively.

The user naturally sacrifices some control, and some quality, by using HWEB and HTML instead of WEB and TeX. HTML particularly suffers when it comes to mathematical notation, although HWEB plays many clever tricks with tables in "out-of-line" math mode (as in the above example). But I believe the ease of use and clarity of documentation written in HWEB more than makes up for the system's typesetting deficiencies.

There's a long way to go yet with HWEB, and a lot of small perks that could probably be added. (The guiding principle, as I've said, is that the user ought to be able to "just write," and have the formatting engine figure out how best to format his documentation without a lot of control codes cluttering the source. This means a Perl-like "Do What I Mean" semantic that can take a lot of tweaking.)

Arthur O'DWYER
July 2004

Footnotes

[1] For example, the section identifiers @<foo@> and @< foo @> are treated as the same name, but @ifoo@> and @i foo @> have very different effects, as do @Do something and @ Do something. Sure, it makes parsing easy, but if you expend the effort to actually create something this potentially cool, an unsurprising parser ought to be one of the easier things to get right.