Literate programming: Knuth is doing it wrong (2014)

Literate programming: Knuth is doing it wrong (2014)

Oct 3, 2014

Literate programming: Knuth is doing it wrong

Literate programming advocates this: Order your code for others to read,
not for the compiler. Beautifully typeset your code so one can curl up in bed
to read it like a novel. Keep documentation in sync with code.
What’s not
to like about this vision? I have two beefs with it: the ends are insufficiently
ambitious by focusing on a passive representation; and the means were insufficiently
polished, by over-emphasizing typesetting at the cost of prose quality.
Elaboration, in reverse order:

Canonizing typesetting over organization

When I look around at the legacy of literate programming, systems to do
so-called semi- or quasi-literate programming dominate. These are systems that
focus on generating beautifully typeset documentation without allowing the
author to arbitrarily order code. I think this is exactly backwards; codebases
are easy to read primarily due to the author’s efforts to orchestrate the
presentation, and only secondarily by typesetting improvements. As a concrete
example, just about every literate program out there begins with cruft like
this:1

// Some #includes

or:

-- Don't mind these imports.

I used to think people just didn’t understand Knuth’s vision. But then I went
and looked at his literate programs. Boom, #includes:





The example Pascal program in Knuth’s
original paper
didn’t have any imports at all. But when it comes to
organizing larger codebases, we’ve been putting imports thoughtlessly at the
top. Right from day one.

Exhibit 2:





“Skip ahead if you are impatient to see the interesting stuff.”
Well gee, if only we had, you know, a tool to put the interesting stuff up
front.

Exhibit 3:





This is the start of the piece. There’s a sentence of introduction, and then
this:

We use a utility field to record
the vertex degrees.

#define deg u.I

That’s a steep jump across several levels of abstraction. Literally the first
line of code shown is a macro to access a field for presumably a struct whose
definition — whose very type name — we haven’t even seen
yet. (The variable name for the struct is also hard-coded in; but I’ll stop
nit-picking further.)

Exhibit 4: Zoom out just a little bit on the previous example:





Again, there’s #includes at the top but I won’t belabor that. Let’s look at
what’s in these #includes. “GraphBase data structures” seems kinda relevant to
the program. Surely the description should inline and describe the core data
structures the program will be using. In the immortal words of Fred
Brooks
:

Show me your flowcharts [code]
and conceal your tables [data types], and I shall continue to be
mystified. Show me your tables, and I won’t usually need your flowcharts;
they’ll be obvious.”

Surely a system to optimize order for exposition shouldn’t be stymied by
code in a different file.

On the whole, people have failed to appreciate the promise of literate
programming because the early examples are just not that good, barring the
small program in Knuth’s original paper. The programs jump across abstraction
layers. Problems are ill-motivated. There’s a pervasive mindset of top-down
thinking, of starting from main, whether or not that’s easiest to read. The
ability to change order is under-used, perhaps because early literate tools
made debugging harder, but mostly I think because of all the emphasis —
right from the
start
— on just how darn cool the typesetting was.2

All this may seem harsh on Knuth, but I figure Knuth can take it. He’s, well,
Knuth, and I’m nobody. He came up with literate programming as the successor
to structured programming, meaning that he was introducing ordering considerations
at a time when most people were still using gotos as a matter of
course. There was no typesetting for programmers or academics, no internet, no
hyperlinking. No, these early examples are fine for what they are. They
haven’t developed because we programmers have failed to develop them
over time. We’ve been too quick to treat them as sacred cows to be merely
interpreted (not noticing the violence our interpretations do to the original
idea anyway). I speculate that nobody has actually read anybody else’s
literate programs in any sort of detail. And so nobody has been truly inspired
to do better. We’ve been using literate programming, like the vast majority of
us use TAOCP,
as a signalling device to show that we are hip to what’s cool. (If you have
spent time reading somebody else’s literate programs, I want to hear about
your experiences!)

Canonizing passive reading over interactive feedback

I’ve been indirectly maligning typesetting, but it’s time to aim squarely at
it. There’s a fundamental problem with generating a beautifully typeset
document for a codebase: it’s dead. It can’t render inside just about any
actual programming environment (editor or IDE) on this planet, and so we can’t
make changes to it while we work on the codebase. Everybody reads a pdf about
a program at most once, when they first encounter it. After that, re-rendering
it is a pain, and proof-reading the rendered document, well forget about it.
That dooms generated documentation to be an after-thought, forever at risk of
falling stale, or at least rendering poorly.

You can’t work with it, you can’t try to make changes to it to see what
happens, and you certainly can’t run it interactively. All you can do,
literally, is curl up with it in bed. And promptly fall asleep. I mean, who
reads code in bed without a keyboard?!

What’s the alternative? In the spirit of presenting a target of my own for
others to attack, I’ll point you at some literate code I wrote last
year
for a simple interpreter. A sample of what it looks like:

 // Programs are run in two stages:
 //  a) _read_ the text of the program into a tree of cells
 //  b) _evaluate_ the tree of cells to yield a result
 cell* run(istream& in) {
   cell* result = nil;
   do {
       // TEMP and 'update' help recycle cells after we're done with
       // them.
       // Gotta pay attention to this all the time; see the 'memory'
       // layer.
       TEMP(form, read(in));
       update(result, eval(form));
   } while (!eof(in));
   return result;
 }
 
 cell* run(string s) {
   stringstream in(s);
   return run(in);
 }
 
 :(scenarios run)
 :(scenario examples)
 # function call; later we'll support a more natural syntax for
 # arithmetic
 (+ 1 1)
 => 2
 
 # assignment
 (=> 3
 
 # list; deliberately looks just like a function call
 '(1 2 3)
 => (1 2 3)
 
 # the function (fn) doesn't have to be named
 ((fn (a b)  # parameters (params)
     (+ a b))  # body
    3 4)  # arguments (args) that are bound to params inside this call
 => 7

A previous post describes
the format, but we won’t need more details for this example. Just note that it
is simple plaintext that will open up in your text editor. There is minimal
prose, because just the order of presentation does so much heavy lifting.
Comments are like code: the less you write, the less there is to go bad. I’m
paying the cost of ‘//
to delineate comments because I haven’t gotten around to fixing it, because
it’s just not that important to clean it up. You can’t see it in this sample,
but the program at large organizes features in self-contained layers, with
later features hooking into the code for earlier ones. Here’s a
test harness
. (With, I can’t resist pointing out, the includes at the
bottom.) Here’s a garbage
collector
. Here
I replace a flat namespace of bindings with dynamic scope. In each case, code
is freely intermingled with tests to exercise it (like the scenarios
above), tests that can be run from the commandline.

 $ build_and_test_until 029  # exercises the core interpreter
 $ build_and_test_until 030  # exercises dynamic scope
 ...

Having built the program with just a subset of layers, you’re free to poke at
it and run what-if experiments. Why did Kartik write this line like
so?
Make a change, run the tests. Oh, that’s why. You can add
logging to trace through execution, and you can use a debugger, because you’re
sitting at your workstation like a reasonable programmer, not curled up in
bed.

Eventually I’d like to live in a world where our systems for viewing live,
modifiable, interactive code are as adept at typesetting as our publishing
systems are. But until that day, I’ll choose simple markdown-like plain-text
documentation that the author labored over the structure of. Every single
time.

footnotes

1. Literate Haskell and CoffeeScript to a lesser extent allow
very flexible ordering in the language, which mitigates this problem. But then
we have their authors telling
us
that their tools can be used with any language, blithely ignoring the
fact that other languages may need better tools. Everybody’s selling mechanisms,
nobody’s inculcating the right policies.

2. We’ve all had the little endorphin rush of seeing our
crappy little papers or assignments magically improved by sprinkling a little
typesetting. And we tend to take well-typeset documents more
seriously
. The flip side to this phenomenon: if it looks done you
won’t get as much feedback on it
.



Join the pack! Join 8000+ others registered users, and get chat, make groups, post updates and make friends around the world!
www.knowasiak.com/register/
Read More

Related Articles

x86 Is an Octal Machine

# source:http://reocities.com/SiliconValley/heights/7052/opcode.txt From: mark@omnifest.uwm.edu (Mark Hopkins) Newsgroups: alt.lang.asm Subject: A Summary of the 80486 Opcodes and Instructions (1) The 80×86 is an Octal Machine This is a follow-up and revision of an article posted in alt.lang.asm on 7-5-92 concerning the 80×86 instruction encoding. The only proper way to understand 80×86 coding is to realize that…

What’s recent in Emacs 28.1?

By Mickey Petersen It’s that time again: there’s a new major version of Emacs and, with it, a treasure trove of new features and changes.Notable features include the formal inclusion of native compilation, a technique that will greatly speed up your Emacs experience.A critical issue surrounding the use of ligatures also fixed; without it, you…

A Frequency Primarily based mostly exclusively Theory of Catalysts (1999) [pdf]

9ëÚði­û§­`Òúë…ÿßêú‚k1MÞ»KÿÿmÕý´»uL4Ü®;”EÆ3á×´õEÅ4]ѲÂpÝ4û”» ð ßA¤ƒý:v®üiSê#þ»†ë‡_ûɁ4Òývÿ°—Ø#ÂOýµ×ºÿÂ3¯éâ’nôēˆÃ(‘aiéÂy­è»TdWt&Úd®{v©ø’IÚtƒÈò£Ýnèiý§à¿ÿœR-…·&S¿úÂþø_©¢ûZÞÄ/û¿Øßoì5ÿw2”8§R³òfÛ¹³ËËM$dqÃü¥«|-5“÷uj½¶½’ÿÿA>ý„ý¯ÝeÕ:¯ÿ‰Ÿÿ±[º îëö¾žïûÿõ¾¾ømÌ´h?WNhdVõzßׄê?ûUêñÿUп];·þ½Qu®½}_ã®þ¿íßzu­n»·wuùŒz®úޞ꺿Ð0ß7¶‹ˆ74A·H4ÓÈïܔýi:OWt¯µ¯Kÿÿ¿‡ÿè²}÷¯çþ½àýúýÕõêïkºð—ßÝý=tô–ûþᴛm§V䳉Äý°ƒuÓ]ÿ»-¿é?¿¯ðí×_¿÷ÞÂ-÷Mァݧÿ¹ÌXpÍçúî×ä(0zöÈ?cK¦–×O¾Ú@ÃvXÍ;ÖxOiǸø…ïa…_×ߓÖú­¿†ÿÿþí~Âw÷í?¿ýêÃýºv“ë´!øXNŸm.ä0Ÿ}–ÓÛi’a{hQzëÛöØ…uþü0ÖÕ>÷µ¯_ûÿØuÿÝû­ßÖ¿ê÷¼3—úw …îötAÓ´¸ÝÓêØ¤û×½íŽÿcøm»T|*sý뇈zs0˜-í¢ß×ÿ^ÿúþÿÕ¶ý¿¬6ÿþ˜f¬güïZm„´áÇ{°±ÓOÃÝ߅µˆ»Oߍ;|*ßýÜ0×_]ÿW|: ¯þ—{Uíöÿ_}’Ýÿáœ^ªÃ1iøUÓv;4jÒiÚv½m[ku‡ ÐqD0B!¡œMh_PÛ~XŠ¿þýéûuî–Õßÿ·ïNý}x}ÿØ3{ âï¾6/‹×¦®)éöÚÃB.iÃí2íM8¸DA¡~¨7úÿþƒaýÐOIþwÛWÿµôõ¿¿Wm.ûcuPú}»¦ÚiâôÐi®¨DCB”‘R‚;B(‘½AýQ&w÷ø6UP—¯úûýÿjëÙÏï^ÎiïùåÚ5^½¯ºa?_Âi ÂšFM!½’²ŸÿÜ=ÿKëÚÿ¦½¦õ}±wv[bõºÝ=ôÿi¯wvš 6k4ˆ‰kïmh4ßÒkëû}¿ïÓ½ÒU6Ö-v8a(i«^ƞÔatᅎÛmÃA„’LDDDDDDW†h“6ßÅý®ÿ×»·öŸa85n¯†Û±OÛOý´ö«¸¦°Â¾0‡û}µNûë÷|aqöÅ]ÅUîþá§ÙéÚ >¾„!‘G$žÕ7ÝûþûóÍ’†—{Ý4í5½0šqaˆˆ†Z BfÄXÓ %­·Øá‚l=·S’¦k c}’xi aF†””””””!ðe.´BÀV°o·‹iµ„GŠOj™(M0B””!„␶6Þ÷}Úpa ˆˆ0pƒ|4!Ða4áÄA˜„”̚=ƒ/‘Á ¸ˆˆ`„DFª”›xê@Ão œ0ïÐ0AƒnšÓ îa ¾‘߸Ї¹M)”U¨‚D’H4ܶÊQÞ¤vb M; i岯; Ðw{Ö¢Ü&š…­¬5I7-aeÓϑ÷yj ›†ƒJ=UÓrÔª ¾#´@æ×&zòÎ4®÷Ηÿi7,¢3»ÂmSæE‹M5ÓõºiûýþƒSÙ©‚“‘ÈC2CRxû:›{ßó”MPy‚»#Šd¸{By~[—B*!3FN“ ’Ì õ¿©I©&Œž™pD- ÐÓ¡*˙z›œ_ ¬5ÂíPvƒÂ$† ¼4ÿ靋H3Ì4Q ¤‚##àšgƒðÐÓUêéçº5·5üÄ „N-Ý­#C Z5ۏ–ÿ§„ÐeMó¡v‡¦ªtŽ;Ý»ÿkaIå Ú8m!ÚWA4‚3o°©êF9 = ßô×+²’ŒF‚BDù TàBÂ{aº.nöA{ƒoí2yª¿÷{²SMÐt¯Ý]XA„öùó°T¤ƒ!„²âž$- ª ª¡¨ØØZ†ä8÷t¶ì•½nš3ÛÍû¾”’÷@Â{ Âß õþŸWö» •ÖtÓLíR)ÍiǦš-ØuÂ-Û wScj›©+ÓÉ}Üë½Bÿ’#k’Oÿ÷ñIhv«¿¯¤ºwBŠãX’ôʘ2.h¢ ò¦ˆy¤àÅC·ÞÛT ƒ» ƒm äyáSíÖèoíV÷¨ëCií»÷§_¦ô½[ÿ¿ÓN吐B^3G—BEàÂ,‰ x(A“‰OgÓL/@ÚAï’Ë,é³ÉßއzßJ£ûî?ßúöÿÿäÕÞN ÷¯ûý’¬I…äÜ´ŠO›” 3j´á®„Öª꟣c‘cF{¤ò}zº§WÝ*Ìô®_¶•6»ýiWð»/¯ß𷄮^¿ÿ~n[â;R31„ÂhCNîÑcµEC©¡èÖðm6ƒ}ß[Âo]zmÚ«­z‹WþÞú_}É8OZ³Tù«ÌÐûZÿ× »ÓúïíWê“tˆÆj2t`Á BaWôƒØPxCr8¤Ãrïþxý‘ÇiÚµ­’zu×®ÿþK[i—ö;oõþèãö¿Öíl6Ÿí}ëýŒÓÊëIx1ah„]´N7 ù÷“´gµAéäœ#ŸOK?oÿîöF5ÿÛþØjú~«Nñ[õßÿö“Ò¿ýû½·_ì…Ú÷Aw2f¢õVªÔhnþ½z￯ðAÿý:ÞÃÓÿÿÑã·¯¾ýûßõÿPÃûé¯iW÷„“öhìÓM ò,0Ú¤ÂÒ…ÉŸtÖ֓×âþñûÇû½¯Ü½?¶¿þ½.ß×_߸7ë^½ÿÿ~íÝû÷ºï¿þ•ݵ4»öpžä¡/x°­Ýֿܯª¿wõ_ë­[×þü6Ý~þëº÷÷ß½C¦½{¾ÄÚ°fýVÛ»_áÒ×°Íεpò8″Ô7¼–jÚsëÇކ¿ßõìþ¾½’ý»ïUý§a>ÿß·¿Ûëᅦoôívöޞÿ]‹a„âûÙæ­Ó§l‚g·ØÏ=u𘒯éò5ôèOûÓúá_õïæ ©i?ì„Aãîÿðcë½Öׇßoþö«{=ß»ß|~ÛoàõxÚ=÷‹t˜áºK­½ÃŸ½;¤Úõè׺ª÷¯Ú¯ÿýê®’ÿ¿ß­ýõu}/]öÏõXý86.øµd K¬;¿·~>÷cÛâ®÷âƒ_jµþ¾ýäG~¿øDÇùÌõ)Óþ»Ý½ûíÿÞŵӇíþ“nÖ{W÷éíïïü0œ4· §ô» ƒBÿ¦«ÿä±=N1>ý»ÿûH’ð¿}§ÿüý/w¨½Š·§¬2;óÖ¶ÚPj¡«§¸¦­wkM¯…[AÃC,`A aS „AšÐhTÓNÓL’¯û÷_8ξ“eýþラUOÿî•ÿ¯ýÚÝ^æi†ÁX¶ËM†•±]íb.ÓO´Ð}¦†„DDDDDDC+f¡ˆŒ&ê½ÿÿ^©°ÂLvþ»×{O¾ÿÿÚÿîA00žõá—6-‹jMŠÝ§z 8h4ÂvNÁ=ñÿþ¯¶ö+½Tÿ÷KO¿¾ûØjº½h4Öݎ6¤Æáõ´á‚¨DDDG+»ÿ_þ÷UÚÿßzvž—ì2;ó†]¦Ã/vœƒÕ_ݎÛmô0 Ôé„ÓNBb”8š…y–«É×[Ýûã·ßú}ñw°Âr;ðØö..î×»´4ˀ„!a;®«ûÛûj·úd>õcÖۆ®·µÖÝ7â27†„A„#DR0õÿºÿáûôE¸°šwßÓî×ôîŠÐOþžÚÚ¯šRß!€Ó¿ïû†MƒMa§ ɽ¦™¬!H’ôý],-܋;vöÃÂan¼ ÂM”” â”””#¦ˆ#»¶GÛi1{½ºá§q&Ðiè ÞÅýÆ÷»ÂhC!Âê7zwpÂdá0„CB”4ÝSñF”” 0´ñ2È8G­½šÅ¥q¤µö7ëxj«îÕ|Ko¯îSKh•4 Já¦×”Òˆ&¯§íË`³$åéú‡½[hNËÃÓirՙ ÍÝC-B‰=Ýëd(ÞZ`Zù-Ã_Ó|³A£ºk»ãYN­nå”#L‰æW&žûk•Ì֚çaíSýýZr¦‚­‘|„G£©’d2#fu2|ÜHŠ%A‡y6ÈUµÿûI’ú§dQ3ùðpƒ4a0D`;†„0ƒBÏR î±ëíWÿ®žI¤ÎÌ­ž2@ȨÐdÀ¦ˆÙÂiŬZ-ØN­4ÓXûn[ë]¸Ž/“/έf†*…

Disaster Planning for Regular Folks

Written by lcamtuf@coredump.cx, Dec 2015, minor updates Jul 2021. Twitter: @lcamtuf. Buy the book! Practical Doomsday is an in-depth, data-packed guide to rational emergency preparedness. Compared to the original content hosted on this page, the book strikes a far more mature tone, and provides much deeper insights on many key topics. For example, it dedicates…