Developing the Bound Compiler: Segment 4
June 15, 2020
11 min be taught
Closing as much as this point: January 13, 2021
This post is shatter up into 2 halves: the first half explains the hypothesis within the abet of form-checkers, and the 2nd half presents you an intensive deep-dive into the contrivance in which it’s applied in our compiler for our language Bound. Even in case you aren’t attracted to writing your possess language, the first half is precious in case you’ve ever heard about form systems and are desirous to clutch how they even work!
Fraction This On Twitter
Must you peep this precious, please share this on Twitter. I’m writing up posts on enforcing generics and inheritance on this language too!
What are styles?
We’d talked about this temporarily within the first a part of the compiler sequence, on the opposite hand let’s revisit this:
Featured Content Ads
add advertising hereForms are a form of grouping program values in accordance to the behaviour we’d bask in them to maintain. We now maintain our possess “styles” in our day-to-day lives. E.g. grouping folk as adults and teens.
Grouping values collectively system the compiler doesn’t maintain as many cases to take care of. It system we don’t must peep at the exact tag, we just appropriate cause about it in accordance to the neighborhood it’s in (its form).
E.g. the values 0
, 1
, 2
, 3
… and all other integers are all given the form int
. Values of form int
can spend part in arithmetic operations bask in multiplication on them but we can’t concatenate them (bask in string
s). Survey how the form (int
vs string
) defines the behaviour of the tag (what we can fabricate with them)?
It’s even clearer in our customized styles. E.g ought to you interpret a personalised Animal
class with spend()
and sleep()
programs, you’re asserting that objects of form Animal
maintain spend
and sleep
behaviour. But we can’t divide two Animal
objects by one but another – that’s no longer the behaviour we’d allow. What would dogs / cat
even mean?
Featured Content Ads
add advertising hereThe role of the form-checker is to cease this originate of nonsensical behaviour from going on. That’s what we’re constructing this present day.
All smartly-behaved languages maintain form-checking in some originate. Statically typed languages bask in Rust, Java or Haskell take a look at the styles at compile-time. Dynamically-typed languages bask in JS and Python fabricate mild maintain styles – values are tagged with styles at runtime, and they take a look at styles when executing. Must you strive to bustle 5 / "Hello"
, it received’t basically bustle the code, JS/Python will look "Hello
has form string
and can throw a runtime error in preference to executing it.
Exact now, this all feels a minute bit hand-wavey. Let’s formalise “combating nonsensical behaviour”. We wish a record of principles to maintain a examine a program against, after which if it passes these, all of us know our program is real.
This collection of principles is is legendary as a form machine.
Kind Systems
Right here’s the deal: styles interpret real behaviour. So if we can give an expression a form, we can spend its form to originate particular it’s being extinct within the factual system. The form machine’s principles (called typing judgements) therefore strive to put a form to an expression .
We originate by defining seemingly evident information, bask in factual
is of form bool. In our form machine, the rule is written as follows:
Don’t be panicked by the math notation:
system “it follows that”. You would possibly be taught this as “it follows that the expression factual
has form bool
. In regular,
is also be taught as “it follows that expression e has form t”.
Right here’s a couple more evident information.
for any integer .
Typing environments
Ingredient is, wanting at an expression on its possess isn’t constantly sufficient for us to form-take a look at it. What’s the originate of a variable x
? What ought to mild we put rather than the ?
below?
Well it is reckoning on what form it modified into given when it modified into defined. So when form-checking, we’ll must preserve note of the variables’ styles as they are defined. We call this the typing atmosphere, represented in our principles as . Deem of as a look-up feature that maps variables to their styles.
We update our typing principles to consist of :
.
Inspire to our typing rule right here:
If , then we can command that:
.
In our typing principles we stack these two statements to provide one compound typing rule for variables. Right here’s an example of an inference rule.
Inference principles
This abnormal stacked “fraction” is a form of representing deductive reasoning (bask in Sherlock Holmes!). If every little thing on the highest holds, then we infer the underside additionally holds.
Right here’s one other rule. This says that if all of us know
and
are int
s, then if we add them collectively the consequence is additionally an int
.
The utilization of these inference principles, we can stack collectively our pieces of proof about diverse parts of the program to cause about the entire program. Right here’s how we make up our form machine – we stack our principles.
Let’s mix what we’ve learnt so a ways to form-take a look at a minute bit expression: x + 1
where our atmosphere
tells us x
is an int
.
Now if it turns out x
is an string
, then
doesn’t preserve, so the entire principles stacked below it don’t preserve. Our form-checker would elevate an error, as the styles don’t match up!
Axioms
It’s convention right here to affirm the information (axioms) with a line above them. You would possibly deem them as the “contaminated case”:
Since they don’t maintain something else on the highest, technically every little thing on the highest is factual. So the underside constantly holds.
Must you peep abet to our example and flip it the opposite contrivance up, it originate of looks bask in a tree. It additionally lays out our reasoning, so acts as a proof – you’ll likely be ready to just appropriate note the steps. Why is x+1
an int? Well x
and 1
are int
s? Why is x
an int? On story of… You secure the assumption. So what we’ve constructed right here is is legendary as a proof tree.
Ample, let’s peep at one other rule. What about an if-else
block?
if (something) {
do_one_thing
} else {
do_something_else
}
Well, the something
has to maintain form bool
as it’s a situation. What about the branches? We now must give this expression an total form, but on story of we’re statically form-checking, we don’t know which department will be completed at runtime. As a consequence of this truth we need each branches to return the comparable form (call it
), so the expression has the the same form regardless of which department we fabricate.
JavaScript and Python would allow diverse styles for every department, since they’re doing their form-checking at runtime (dynamically typed), so they know which department modified into chosen.
Typing the Total Program
This post would dwell up being too lengthy if I maintain been to record the entire principles, but you’ll likely be ready to work by each of the cases. We peep at the grammar we defined within the previous post and write the corresponding principles.
We’re desirous to expose the program is properly-typed (you’ll likely be ready to put it a form) to expose it is real. So we in fact want a proof tree that reveals it has some form (we don’t care which):
There’s just appropriate one ingredient lacking right here. Before every little thing – it’s empty as we haven’t defined any variables. How fabricate we add variables to ?
Keep in mind, we add variables to
as we expose them in our program. The syntax for that is a let
declaration. So command we maintain the next program:
We form-take a look at this within the issue the program executes:
- We secure the originate of the expression
, call it - We then lengthen the atmosphere
with the brand new mapping . - We spend this extended atmosphere (which we write as
) to form-take a look at and give it form .
The entire typing rule thus looks bask in:
Kind Checking vs Kind Inference
One finer point: right here we wrote let x = e
. Shall we maintain equally written this as let x : t = e
– where the programmer annotates x
with form t
. e.g. let x : int = 1 + 2
.
These lead to diverse typing algorithms – within the first case the compiler infers that 1+2
has form int
, and within the 2nd case, the compiler has to take a look at that 1+2
has form int
(since we specified the form int
of x
).
As you would possibly well perhaps agree with, form inference system that the programmer has to jot down fewer form annotations, but it absolutely is a ways more complex for the compiler – it’s bask in filling a Sudoku from scratch versus checking a Sudoku resolution is suited. OCaml and Haskell are examples of languages with form inference baked in.
In practice, most statically-typed languages fabricate require some form annotations, but can infer some styles (e.g. the auto
keyword in C++). Right here’s bask in finishing a partially solved Sudoku puzzle and is a ways more straightforward.
For Bound, we’re going to infer styles inner a feature or system definition, but require programmers to annotate the parameter and return styles. Right here’s a pleasant heart floor.
example_function.race
feature int something(int x, bool y){
let z = ...
...
}
Ample, sufficient theory, let’s secure to enforcing this form-checker!
I originate assert material about my instrument engineering whisk, curated in my publication!
Are desirous to be taught about how more appropriate language aspects bask in generics and inheritance are typed? Subscribe to receive *abnormalentry to future posts sooner than they’re launched to the regular web.
Implementing a Kind-checker
Exact give me the code!
You’ll must clone the Bound repository:
git clone https://github.com/mukul-rathi/race
The Bound repository grasp
department contains a more complex form-checker, with give a boost to for inheritance, feature/system overloading and generics. Each of these issues will secure its possess special post later within the sequence.
So as an different, you’ll are desirous to peep at the straightforward-compiler-tutorial
free up. You would possibly fabricate this as follows:
git checkout straightforward-compiler-tutorial
This contains a stripped-abet model of Bound from earlier within the approach processs. (Search files from this on-line)
The folder we care about is src/frontend/typing
.
A expose on OCaml syntax
On this tutorial we’ll be the utilization of OCaml syntax. Must you’re abnormal with this, the principle gist is that we’ll:
a) Pattern match each of the cases (bask in switch
statements in other languages). Right here we maintain a variable x
and we fabricate diverse things in accordance to each of its cases A
, B
match x with
| A -> something
| B -> something_else
b) Use a Result monad: in essence, this has two values: Ample something
and Error
. We sequence each of the operations the utilization of the >>=
operator – you don’t must know something else about monads for this tutorial, just appropriate think this as the corresponding to regular expression sequencing with ;
s, just appropriate that we’re the utilization of 1 other operator to affirm that earlier expressions would possibly well elevate an error.
Forms in Bound
Bound has four fundamental styles: int
, bool
, void
and user-defined classes. We affirm these four alternate ideas the utilization of a variant form type_expr
in OCaml:
form type_expr = TEInt | TEClass of Class_name.t | TEVoid | TEBool
Annotating our AST with styles
Let’s recap our Abstract Syntax Tree from the final a part of the sequence:
form identifier =
| Variable of Var_name.t
| ObjField of Var_name.t * Field_name.t
form expr =
| Integer of loc * int
| Boolean of loc * bool
| Identifier of loc * identifier
| Constructor of loc * Class_name.t * constructor_arg record
| Let of loc * type_expr option * Var_name.t * expr
| Set of loc * identifier * expr
| If of loc * expr * block_expr * block_expr
...
and block_expr = Block of loc * expr record
This AST annotates each expression with loc
– the line and placement of the expression. In our form-checking part, we’ll be checking the styles of every of the likely expressions. We’ll are desirous to store our outcomes by straight annotating the AST, so the next compiler stage can look the styles just appropriate by wanting at the AST.
This AST gets the imaginative title typed_ast
:
form identifier =
| Variable of type_expr * Var_name.t
| ObjField of Class_name.t * Var_name.t * type_expr * Field_name.t
form expr =
| Integer of loc * int
| Boolean of loc * bool
| Identifier of loc * identifier
| Constructor of loc * type_expr * Class_name.t * constructor_arg record
| Let of loc * type_expr * Var_name.t * expr
| Set of loc * type_expr * identifier * expr
| If of loc * type_expr * expr * block_expr * block_expr
...
and block_expr =
| Block of loc * type_expr * expr record
We don’t annotate evident styles, bask in for Integer
and Boolean
, but we annotate the originate of the total expression for other expressions e.g. the form returned by an if-else
assertion.
A lawful rule of thumb when annotating the AST is, what would the next stage ought to mild be suggested about the program that it ought to’t guess from it being properly-typed? For an if-else
assertion, whether it is properly-typed then the if-situation expression is clearly of form bool
, but we’d ought to mild be suggested the originate of the branches.
The Kind Atmosphere
Have interaction that we extinct our form atmosphere to peep up the styles of variables. We are able to store this as a record of bindings (variable, form) pairs.
type_env.ml
additionally contains a “atmosphere” of helper functions that we’ll spend on this form-checking part. These are largely tedious getter programs that you just’ll likely be ready to peep at within the repo.
form type_binding = Var_name.t * type_expr
form type_env = type_binding record
val get_var_type : Var_name.t -> type_env -> loc -> type_expr Or_error.t
...
It additionally ideas a couple of functions that take a look at we can put to an identifier (it’s no longer const
or the special identifier this
) and that take a look at we don’t maintain replica variable declarations (shadowing) within the the same scope. These again are conceptually easy but are the biggest just appropriate to quilt edge cases.
For example:
let check_identifier_assignable class_defns identity env loc =
let originate Result in
match identity with
| Parsed_ast.Variable x ->
if x = Var_name.of_string "this" then
Error
(Error.of_string
(Fmt.str "%s Kind error - Assigning expr to 'this'[email protected]" (string_of_loc loc)))
else Ample ()
| Parsed_ast.ObjField (obj_name, field_name) ->
get_obj_class_defn obj_name env class_defns loc
>>= relaxing class_defn ->
get_class_field field_name class_defn loc
>>= relaxing (TField (modifier, _, _, _)) ->
if modifier = MConst then
Error
(Error.of_string
(Fmt.str "%s Kind error - Assigning expr to a const [email protected]"
(string_of_loc loc)))
else Ample ()
Typing Expressions
This. Right here’s the crux of the implementation. So in case you’ve been skimming the post, right here’s where you ought to mild hear.
What fabricate we must form-take a look at an expression?
We need the category definitions and maintain definitions, in case we must put a question to the styles of fields and maintain/system form signatures. We additionally need the expression itself, and the typing atmosphere we’re the utilization of to form-take a look at it.
What fabricate we return? The typed expression, along with its form (we return form individually to originate recursive calls less difficult). Or we return an error, if the expression is no longer properly-typed.
Our feature form signature captures this precisely:
val type_expr :
Parsed_ast.class_defn record
-> Parsed_ast.function_defn record
-> Parsed_ast.expr
-> type_env
-> (Typed_ast.expr * type_expr) Or_error.t
Within the 2nd post on this sequence, I talked about why we’re the utilization of OCaml. This stage of the compiler is one where it in actuality pays off.
For example to form an identifier, we pattern match in accordance as to whether or no longer it is a variable x
or an object self-discipline x.f
. If it is a variable then we secure its form from the atmosphere (we pass in loc
as line+location files for error messages). If it returns a var_type
without an error, then we return the form-annotated variable.
If it is an object self-discipline x.f
, then we must peep up the originate of object x
within the env
and secure its corresponding class definition. We are able to then peep up the originate of self-discipline f
within the category definition. Then we annotate the identifier with the two bits of form files we’ve just appropriate learnt: the category of the thing, and the self-discipline form.
Our code does precisely that, and not utilizing a boilerplate:
let type_identifier class_defns identifier env loc =
let originate Result in
match identifier with
| Parsed_ast.Variable var_name ->
get_var_type var_name env loc
>>| relaxing var_type -> (Typed_ast.Variable (var_type, var_name), var_type)
| Parsed_ast.ObjField (var_name, field_name) ->
get_obj_class_defn var_name env class_defns loc
>>= relaxing (Parsed_ast.TClass (class_name, _, _, _) as class_defn) ->
get_class_field field_name class_defn loc
>>| relaxing (TField (_, field_type, _, _)) ->
(Typed_ast.ObjField (class_name, var_name, field_type, field_name), field_type)
Exact, so let’s peep at expressions. This again is easy code that just appropriate reads bask in our typing judgements. We now maintain expr1 binop expr2
where expr1
and expr2
are being blended the utilization of some binary operator e.g. +
.
Let’s remind ourselves of the rule:
What did this command? We form-take a look at the subexpressions
and
first. If they’re each int
s then the total expression is an int
.
Right here’s the principle conceptual jump: form-checking each of the expressions on the highest of the judgements corresponds to recursively calling our form-checking feature on the subexpressions. We then mix the implications of these form-checking judgments the utilization of our rule.
So right here, we form-take a look at each of the subexpressions expr1
and expr2
(type_with_defns
just appropriate makes the recursive call shorter).
let rec type_expr class_defns function_defns (expr : Parsed_ast.expr) env =
let originate Result in
let type_with_defns = type_expr class_defns function_defns in
let type_block_with_defns = type_block_expr class_defns function_defns in match expr with
| Parsed_ast.BinOp (loc, bin_op, expr1, expr2) -> (
type_with_defns expr1 env
>>= relaxing (typed_expr1, expr1_type) ->
type_with_defns expr2 env
>>= relaxing (typed_expr2, expr2_type) ->
Recursive calls, take a look at.
Subsequent, on story of our code presents with more binary operators than just appropriate +
we a minute bit generalise our typing judgement. In spite of whether or no longer it is &&
+
or >
, the operands expr1
and expr2
must maintain the the same form. Then we take a look at that this form is int
for the + / %
arithmetic operator cases.
If that is the case, then all Ample
and we return TEInt
as the originate of the operand.
if no longer (expr1_type = expr2_type) then
Error ...
else
let type_mismatch_error expected_type actual_type = ...
match bin_op with
| BinOpPlus | BinOpMinus | BinOpMult | BinOpIntDiv | BinOpRem ->
if expr1_type = TEInt then
Ample (Typed_ast.BinOp (loc, TEInt, bin_op, typed_expr1, typed_expr2), TEInt)
else type_mismatch_error TEInt expr1_type
...
As one other example of a binary operator, let’s peep at < <= >>> >=
. These take in integers and return a bool, so we take a look at that the operand expr1
has form TEInt
and we return TEBool
| BinOpLessThan | BinOpLessThanEq | BinOpGreaterThan | BinOpGreaterThanEq ->
if expr1_type = TEInt then
Ample (Typed_ast.BinOp (loc, TEBool, bin_op, typed_expr1, typed_expr2), TEBool)
else type_mismatch_error TEInt expr1_type
Ample, mild with me? This post is getting lengthy, so let’s just appropriate peep at if-else
statements and let
expressions, after which you’ll likely be ready to peep at the rest of the code within the repo.
All over again, let’s remind ourselves of our typing judgement for if-else
.
That’s three expressions on top of the inference rule.
What fabricate these expressions correspond to? Direct it with me, recursive calls!
| Parsed_ast.If (loc, cond_expr, then_expr, else_expr) -> (
type_with_defns cond_expr env
>>= relaxing (typed_cond_expr, cond_expr_type) ->
type_block_with_defns then_expr env
>>= relaxing (typed_then_expr, then_expr_type) ->
type_block_with_defns else_expr env
>>= relaxing (typed_else_expr, else_expr_type) ->
Now we must maintain a examine that the returned styles are what we anticipated, i.e. the branches maintain the the same form, and the location expression has form TEBool
. If that’s the case, it’s all Ample
and we can return the originate of the department.
if no longer (then_expr_type = else_expr_type) then
Error ...
else
match cond_expr_type with
| TEBool ->
Ample
( Typed_ast.If
(loc, then_expr_type, typed_cond_expr, typed_then_expr, typed_else_expr)
, then_expr_type )
| _ ->
Error ...
Exact, final expression for this post, the let
expression, which looks bask in this:
All over again, this requires two recursive calls, but expose that the 2nd typing judgement requires the form of the first – we must pass in a long form atmosphere (after we form-checked ) to story for this.
We additionally maintain an additional requirement: we need our let
expressions to be block-scoped.
block_scoped.race
if {
let x = ...
} else {
}
So in essence, we’re desirous to best update the atmosphere
- if we maintain a
let
expression, - best for subsequent expressions within the block
We are able to encode this by pattern-matching on our block form-checking rule. If there have to no longer any subsequent expressions (i.e. we fabricate no longer maintain any expressions (pattern-match on []
) or simply appropriate one expression (pattern-match on [expr]
) within the block), then we don’t must update our atmosphere.
type_block_expr class_defns function_defns (Parsed_ast.Block (loc, exprs)) env =
...
>>= relaxing () ->
match exprs with
| [] -> Ample (Typed_ast.Block (loc, TEVoid, []), TEVoid)
| [expr] ->
type_with_defns expr env
>>| relaxing (typed_expr, expr_type) ->
(Typed_ast.Block (loc, expr_type, [typed_expr]), expr_type)
Handiest if we maintain no lower than two expressions left in our block (pattern-match on expr1 :: expr2 :: exprs
), and the first of the two is a let
expression fabricate we update the atmosphere for the rest of the block. We spend this as much as this point atmosphere in a recursive call on expr2::exprs
(the final expressions). We then mix the consequence of the typed blocks.
| expr1 : : expr2 : : exprs ->
type_with_defns expr1 env
>>= relaxing (typed_expr1, expr1_type) ->
(let updated_env =
match typed_expr1 with
| Typed_ast.Let (_, _, var_name, _) -> (var_name, expr1_type) : : env
| _ -> env in
type_block_with_defns (Parsed_ast.Block (loc, expr2 : : exprs)) updated_env)
>>| relaxing (Typed_ast.Block (_, _, typed_exprs), block_expr_type) ->
(Typed_ast.Block (loc, block_expr_type, typed_expr1 : : typed_exprs), block_expr_type)
Typing Class and Purpose Definitions
We’ve broken the abet of the form-checking now. Let’s just appropriate wrap up by checking class and maintain definitions.
We’ll skip over the tedium of checking that there have to no longer any replica class definitions or replica self-discipline definitions in a category etc. (that is within the repo). Likewise, checking that the styles annotated in fields and system/feature form signatures are valids is candy a matter of checking there would possibly be a corresponding class definition.
Unlike our fundamental feature, for other functions, we don’t basically originate with an empty atmosphere when form-checking the feature body as we already know the styles of some variables – the parameters to the feature!
let init_env_from_params params =
Checklist.scheme
~f: (feature TParam (type_expr, param_name, _, _) -> (param_name, type_expr))
params
...
type_block_expr class_defns function_defns body_expr(init_env_from_params params)
We additionally must overview that the originate of the consequence of the body is the return form (or whether it is void
then we don’t care about the form returned).
>>= relaxing (typed_body_expr, body_return_type) ->
if return_type = TEVoid || body_return_type = return_type then
Ample
(Typed_ast.TFunction
(func_name, maybe_borrowed_ret_ref, return_type, params, typed_body_expr))
else Error ...
For class programs, we can initialise the atmosphere and take a look at the return form within the the same system. But all of us know the originate of 1 other special variable this
– the category itself.
let init_env_from_method_params params class_name =
let param_env =
Checklist.scheme
~f: (feature TParam (type_expr, param_name, _, _) -> (param_name, type_expr))
params in
(Var_name.of_string "this", TEClass class_name) : : param_env
```
The set does this fit within the Bound pipeline?
We’re two phases into the Bound compiler pipeline – viewed within the compile_program_ir
feature.
let compile_program_ir ?(should_pprint_past = misleading) ?(should_pprint_tast = misleading)
?(should_pprint_dast = misleading) ?(should_pprint_drast = misleading)
?(should_pprint_fir = misleading) ?(ignore_data_races = misleading) ?compile_out_file lexbuf =
let originate Result in
parse_program lexbuf
>>= maybe_pprint_ast should_pprint_past pprint_parsed_ast
>>= type_program
>>= maybe_pprint_ast should_pprint_tast pprint_typed_ast
>>= ...
Eradicate Away: 3 Actionable Steps
Must you’ve obtained this a ways, favorable job! The Bound repo contains the elephantine code record with the entire typing judgements for every case of the compiler.
Let’s recap what we’ve accomplished so a ways:
- Clarify what properties for your expression it’s essential to must form-take a look at. E.g. a situation expression has form
bool
. - Formalise this with a typing judgement. Use the inference rule to cause about subexpressions.
- Plan the subexpressions within the inference rule to recursive calls, then spend the inference rule to combine their outcomes.
Subsequent up, we’ll bid about the dataflow prognosis extinct to form-take a look at our linear capabilities in Bound. Right here’s just like how the Rust borrow checker makes spend of “non-lexical lifetimes” to maintain a examine borrowing.