Suggestions on Bootstrapping GHC (2018)


I come yet again from the reproducible builds summit 2018 in Paris. Essentially the latest hottest thing within the reproducible-builds challenge looks to be bootstrapping: How discontinue we rep a full working system from appropriate and most productive source code, the usage of runt or no, and even no, binary seeds or auto-generated files. Here’s really anxiety that is a runt bit orthogonal to reproducibility: Bootstrappable builds again me in trusting packages that I constructed, while reproducible builds again me in trusting packages that others constructed.

And while they rep true growth bootstrapping a plump system from appropriate a C compiler written in Intention, and a Intention interpreter written in C, that can perhaps rep every varied (Janneke’s mes challenge), and there are plans to rep that on top of stage0, which begins with a 280 bytes of binary, the problem looks comely tainted in the case of Haskell.

Unreachable GHC

The anxiety is that contemporary Haskell has most productive one viable implementation, GHC. And GHC, written in contemporary Haskell, wants GHC to be rep. So surely all americans obtainable either appropriate downloads a binary distribution of GHC. Or they rep GHC from source, the usage of a presumably older (however no longer grand older) version of GHC that they already have. Even distributions delight in Debian discontinue nothing varied: After they rep the GHC equipment, the builders inform, smartly, the GHC equipment.

There are varied Haskell implementations obtainable. But within the event that they are passe and energetic developed, then they’re implemented in Haskell themselves, usually even the usage of evolved gains that the bulk efficient GHC presents. And even these are insufficient to rep GHC itself, let on my own the some extinct and abandoned Haskell implementations.

In all these instances, at some level an untrusted binary is outdated. Here’s terribly unsatisfying. What discontinue we discontinue? I don’t have the answers, however please enable me to make clear some venues of assault.

Retracing historical past

Clearly, even GHC does no longer exist since the inspiration of time, and the first versions surely were constructed the usage of one thing else than GHC. The oldest version of GHC for which we are in a position to search out a open on the GHC net online page is version 0.29 from July 1996. But the installation instructions write:

GHC 0.26 doesn’t rep with HBC. (It may possibly perhaps well, however we haven’t set within the problem to desire it.)

GHC 0.26 is finest constructed with itself, GHC 0.26. We heartily recommend it. GHC 0.26 can surely be constructed with GHC 0.23 or 0.24, and with some earlier versions, with some effort.

GHC has for certain no longer been constructed with compilers varied than GHC and HBC.

So it sounds as if besides GHC, most productive ever HBC was as soon as outdated to compiler GHC. HBC is a Haskell compiler the build we get the sources of one random version most productive thanks to Parts of it are written in C, so I regarded into this: Assemble HBC, inform it to bring together GHC-0.29, after which step for step rep every (main) version of GHC till today.

The anxiety is that it is non-trivial to rep machine from the 90s the usage of today’s compilers. I swiftly checked out the HBC code atrocious, and had to exchange some files from the usage of varargs.h to stdargs.v, and this is for certain appropriate regarded as one of many the same boundaries looking out for to rep that instruments. Oh, and even the hbc source divulge

# To rep all the pieces done: rep universe
# It's impossible to rep from scratch.
# You've got gotten to have a working lmlc, to
# recompile it (finally).

So I learned that truly, most of it is written in LML, and the LML compiler is written in LML. So it is a ineffective discontinue. (Attributable to Lennart for clearing up a misunderstanding on my aspect right here.

Going again, however doing it otherwise

One other methodology is to race again in time, to some extinct version of GHC, however per chance no longer all methods to the inspiration, after which strive to make inform of one other, officially unsupported, Haskell compiler to rep GHC. Here’s what rekado tried to full in 2017: He inform primarily the most contemporary implementation of Haskell in C, the Hugs interpreter. The inform of this, he compiled nhc98 (yet one other abandoned Haskell implementation), with the hope of creating GHC with nhc98. He made spectacular growth again then, however ran into a anxiety the build the runtime crashed. Per chance somebody is drawn to picking up up from there?

Striking off, simplifying, extending, within the checklist.

Both approaches to this level level of interest on building an extinct version of GHC. This provides complexity: varied instruments (the shell, rep, yacc etc.) could behave varied now in a vogue that causes laborious to debug complications. So per chance it is more relaxing and more rewarding to level of interest on today’s GHC? (At this level I am starting to hypothesize).

I said before that no varied existing Haskell implementation can bring together today’s GHC code atrocious, thanks to gains delight in mutually recursive modules, the foreign characteristic interface etc. And also varied existing Haskell implementations usually come with a particular, smaller blueprint of wierd libraries, however GHC assumes atrocious, so we would must rep that as smartly…

But we don’t must rep it all. Indubitably there’s grand code in atrocious that is never any longer outdated by GHC. Moreover, grand code in GHC that we discontinue no longer must rep GHC, and . So by getting rid of that, we reduce the amount of Haskell code that now we must feed to the diverse implementation.

The final code could inform some gains that are no longer supported by our bootstrapping implementation. Mutually recursive module will be manually merged. GADTs that are most productive outdated for further form safety will be modified by long-established ones, which could rep some pattern suits incomplete. Syntactic sugar also can be desugared. By simplifying the code atrocious in that blueprint, one would be ready a fork of GHC that is nearby of the likes of Hugs or nhc98.

And if there are gains that are laborious to take, per chance we are in a position to lengthen the bootstrapping compiler or interpreter to boost them? As an example, it was as soon as mostly trivial to enhance Hugs with enhance for the # symbol in names – and we are in a position to also be pragmatic and appropriate enable it always, since we don’t desire a standards conforming implementation, however merely one which works on the GHC code atrocious. But how grand would now we must put into effect? Doubtlessly this will be more relaxing in Haskell than in C, so per chance extending nhc98 would be more viable?

Help from beyond Haskell?

Or per chance it’s time to function a brand novel Haskell compiler from scratch, written in one thing varied than Haskell? Per chance some varied language that is pretty satisfying to write a compiler in (Ocaml? Scala?), however that has the bootstrappability legend already sorted out in a formulation.

But within the discontinue, all variants come down to the identical anxiety: Writing a Haskell compiler for plump, contemporary Haskell as outdated by GHC is laborious and really hundreds of labor – if it weren’t, there would on the least be implementations in Haskell obtainable. And so long as no person comes along and does that work, I anxiety that we can continue to be unable to rep our good Haskell ecosystem from scratch. Which I get a runt bit dissatisfying.

Read More



β€œSimplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching