Knowasiak community
Literate programming: Knuth is doing it wrong (2014)

Literate programming: Knuth is doing it wrong (2014)

Oct 3, 2014

Literate programming: Knuth is doing it wrong

Advertisements

Literate programming advocates this: Order your code for others to read,
not for the compiler. Beautifully typeset your code so one can curl up in bed
to read it like a novel. Keep documentation in sync with code.
What’s not
to like about this vision? I have two beefs with it: the ends are insufficiently
ambitious by focusing on a passive representation; and the means were insufficiently
polished, by over-emphasizing typesetting at the cost of prose quality.
Elaboration, in reverse order:

Canonizing typesetting over organization

When I look around at the legacy of literate programming, systems to do
so-called semi- or quasi-literate programming dominate. These are systems that
focus on generating beautifully typeset documentation without allowing the
author to arbitrarily order code. I think this is exactly backwards; codebases
are easy to read primarily due to the author’s efforts to orchestrate the
presentation, and only secondarily by typesetting improvements. As a concrete
example, just about every literate program out there begins with cruft like
this:1

// Some #includes

or:

-- Don't mind these imports.

I used to think people just didn’t understand Knuth’s vision. But then I went
and looked at his literate programs. Boom, #includes:

Advertisements





The example Pascal program in Knuth’s
original paper
didn’t have any imports at all. But when it comes to
organizing larger codebases, we’ve been putting imports thoughtlessly at the
top. Right from day one.

Exhibit 2:





“Skip ahead if you are impatient to see the interesting stuff.”
Well gee, if only we had, you know, a tool to put the interesting stuff up
front.

Exhibit 3:





This is the start of the piece. There’s a sentence of introduction, and then
this:

We use a utility field to record
the vertex degrees.

#define deg u.I

That’s a steep jump across several levels of abstraction. Literally the first
line of code shown is a macro to access a field for presumably a struct whose
definition — whose very type name — we haven’t even seen
yet. (The variable name for the struct is also hard-coded in; but I’ll stop
nit-picking further.)

Exhibit 4: Zoom out just a little bit on the previous example:

Advertisements





Again, there’s #includes at the top but I won’t belabor that. Let’s look at
what’s in these #includes. “GraphBase data structures” seems kinda relevant to
the program. Surely the description should inline and describe the core data
structures the program will be using. In the immortal words of Fred
Brooks
:

Show me your flowcharts [code] and conceal your tables [data types], and I shall continue to be
mystified. Show me your tables, and I won’t usually need your flowcharts;
they’ll be obvious.”

Surely a system to optimize order for exposition shouldn’t be stymied by
code in a different file.

On the whole, people have failed to appreciate the promise of literate
programming because the early examples are just not that good, barring the
small program in Knuth’s original paper. The programs jump across abstraction
layers. Problems are ill-motivated. There’s a pervasive mindset of top-down
thinking, of starting from main, whether or not that’s easiest to read. The
ability to change order is under-used, perhaps because early literate tools
made debugging harder, but mostly I think because of all the emphasis —
right from the
start
— on just how darn cool the typesetting was.2

All this may seem harsh on Knuth, but I figure Knuth can take it. He’s, well,
Knuth, and I’m nobody. He came up with literate programming as the successor
to structured programming, meaning that he was introducing ordering considerations
at a time when most people were still using gotos as a matter of
course. There was no typesetting for programmers or academics, no internet, no
hyperlinking. No, these early examples are fine for what they are. They
haven’t developed because we programmers have failed to develop them
over time. We’ve been too quick to treat them as sacred cows to be merely
interpreted (not noticing the violence our interpretations do to the original
idea anyway). I speculate that nobody has actually read anybody else’s
literate programs in any sort of detail. And so nobody has been truly inspired
to do better. We’ve been using literate programming, like the vast majority of
us use TAOCP,
as a signalling device to show that we are hip to what’s cool. (If you have
spent time reading somebody else’s literate programs, I want to hear about
your experiences!)

Advertisements

Canonizing passive reading over interactive feedback

I’ve been indirectly maligning typesetting, but it’s time to aim squarely at
it. There’s a fundamental problem with generating a beautifully typeset
document for a codebase: it’s dead. It can’t render inside just about any
actual programming environment (editor or IDE) on this planet, and so we can’t
make changes to it while we work on the codebase. Everybody reads a pdf about
a program at most once, when they first encounter it. After that, re-rendering
it is a pain, and proof-reading the rendered document, well forget about it.
That dooms generated documentation to be an after-thought, forever at risk of
falling stale, or at least rendering poorly.

You can’t work with it, you can’t try to make changes to it to see what
happens, and you certainly can’t run it interactively. All you can do,
literally, is curl up with it in bed. And promptly fall asleep. I mean, who
reads code in bed without a keyboard?!

What’s the alternative? In the spirit of presenting a target of my own for
others to attack, I’ll point you at some literate code I wrote last
year
for a simple interpreter. A sample of what it looks like:

 // Programs are run in two stages:
 //  a) _read_ the text of the program into a tree of cells
 //  b) _evaluate_ the tree of cells to yield a result
 cell* run(istream& in) {
   cell* result = nil;
   do {
       // TEMP and 'update' help recycle cells after we're done with
       // them.
       // Gotta pay attention to this all the time; see the 'memory'
       // layer.
       TEMP(form, read(in));
       update(result, eval(form));
   } while (!eof(in));
   return result;
 }
 
 cell* run(string s) {
   stringstream in(s);
   return run(in);
 }
 
 :(scenarios run)
 :(scenario examples)
 # function call; later we'll support a more natural syntax for
 # arithmetic
 (+ 1 1)
 => 2
 
 # assignment
 (=> 3
 
 # list; deliberately looks just like a function call
 '(1 2 3)
 => (1 2 3)
 
 # the function (fn) doesn't have to be named
 ((fn (a b)  # parameters (params)
     (+ a b))  # body
    3 4)  # arguments (args) that are bound to params inside this call
 => 7

A previous post describes
the format, but we won’t need more details for this example. Just note that it
is simple plaintext that will open up in your text editor. There is minimal
prose, because just the order of presentation does so much heavy lifting.
Comments are like code: the less you write, the less there is to go bad. I’m
paying the cost of ‘//
to delineate comments because I haven’t gotten around to fixing it, because
it’s just not that important to clean it up. You can’t see it in this sample,
but the program at large organizes features in self-contained layers, with
later features hooking into the code for earlier ones. Here’s a
test harness
. (With, I can’t resist pointing out, the includes at the
bottom.) Here’s a garbage
collector
. Here
I replace a flat namespace of bindings with dynamic scope. In each case, code
is freely intermingled with tests to exercise it (like the scenarios
above), tests that can be run from the commandline.

 $ build_and_test_until 029  # exercises the core interpreter
 $ build_and_test_until 030  # exercises dynamic scope
 ...

Having built the program with just a subset of layers, you’re free to poke at
it and run what-if experiments. Why did Kartik write this line like
so?
Make a change, run the tests. Oh, that’s why. You can add
logging to trace through execution, and you can use a debugger, because you’re
sitting at your workstation like a reasonable programmer, not curled up in
bed.

Advertisements

Eventually I’d like to live in a world where our systems for viewing live,
modifiable, interactive code are as adept at typesetting as our publishing
systems are. But until that day, I’ll choose simple markdown-like plain-text
documentation that the author labored over the structure of. Every single
time.

footnotes

1. Literate Haskell and CoffeeScript to a lesser extent allow
very flexible ordering in the language, which mitigates this problem. But then
we have their authors telling
us
that their tools can be used with any language, blithely ignoring the
fact that other languages may need better tools. Everybody’s selling mechanisms,
nobody’s inculcating the right policies.

2. We’ve all had the little endorphin rush of seeing our
crappy little papers or assignments magically improved by sprinkling a little
typesetting. And we tend to take well-typeset documents more
seriously
. The flip side to this phenomenon: if it looks done you
won’t get as much feedback on it
.



Join the pack! Join 8000+ others registered users, and get chat, make groups, post updates and make friends around the world!
www.knowasiak.com/register/
Read More

Vanic Onderkoff - avatar
About the author: Vanic Onderkoff
“Simplicity, patience, compassion. These three are your greatest treasures. Simple in actions and thoughts, you return to the source of being. Patient with both friends and enemies, you accord with the way things are. Compassionate toward yourself, you reconcile all beings in the world.” ― Lao Tzu, Tao Te Ching
Advertisements
Advertisements
Please login to continue.
Join and find an online community where thousands of people are posting their photos, videos and written articles daily.
No comments yet