Hot-code reloading on macOS/arm64 with Zig

68
Hot-code reloading on macOS/arm64 with Zig

Hot-code reloading, or sizzling-code swapping, is the potential of the compiler to enable you, the developer, to plot
modifications to your program seem instantaneously whereas this system is already running. In total, compilers
utilise the theorem of dynamic library sizzling swapping (for native targets such as macOS) where your program (common sense) is
compiled and linked down to a dynamic library (shared object) which is then managed and reloaded by
a shrimp loader program running side-by-side 1. Repeat that both packages, the dynamic library and the loader,
attain no longer fraction the identical address home.

In Zig 2, we would bask in to select a appreciate at one thing else. As a change of getting your program managed by a running side-by-side
loader program, what if the compiler would “simply” update the memory of the running route of? You are going to most
inevitably judge I in actual fact enjoy gone utterly crazy, which given essentially the most original yell of global affairs is rarely any longer unthinkable,
nonetheless hear me out.

Here’s my boom. We can pull it off the use of the self-hosted Zig compiler (additionally is known as stage2 compiler). We can pull
it off on Linux 3 and macOS (Residence windows coming quickly TM), and right here’s how. The self-hosted Zig compiler is a
dinky bit particular because it couples very tighly with our in-home linker 4. This gives it particular powers which
successfully mean we are able to fully bypass the theorem of developing relocatable object recordsdata for Zig modules in favour
of writing the already-relocated declarations/symbols right this moment into the final binary. I’ll discuss with this notion
as incremental compilation nonetheless some take to name it in-dwelling binary patching, or incremental linking.
Either manner, the level is, the compiler would now not generate any intermediate relocatable object recordsdata.

From incremental compilation…

Hmm, OK, Jakub, whatever you mean. How about an instance for instance what you mean? Sure thing, solely 1 extra
sentence of clarification sooner than we dive into the enviornment of incremental compilation. The granularity at which the compiler
works is scoped down to single declaration, aka decl or just symbol, which is then incrementally allocated home
in the virtual memory and a file offset, and written to the binary file (there’s a couple extra issues that in actual fact happen right here
such as resolving relocations, if any, nonetheless you win the level). This allows us to update supreme these symbols that
in actual fact modified in an incremental vogue.

OK, instance time! Resolve into consideration the following Zig code

// instance.zig

pub fn foremost() void {
    var x:  u32 = 1;
    _ = bar(); // Here is so that we power the compiler to generate bar sooner than addToBar in the address home.
    const y = addToBar(x);
    affirm(y == 11);
}

fn addToBar(x:  u32) u32 {
    const y = bar();
    return x + y;
}

fn bar() u32 {
    return 10;
}

fn affirm(okay:  bool) void {
    if (!okay) unreachable;
}

In content in confidence to put Zig into incremental compilation mode, we are able to use a particular flag --perceive bask in so

$ zig develop-exe instance.zig --perceive
(zig) 

By this level, the compiler created a in point of fact functional binary that we are able to traipse from disk. On the other hand, since
--perceive flag places the compiler in the REPL mode, we are able to update-and-traipse right this moment from the REPL loop

(zig) update-and-traipse
(zig)

On this case, no output is solely news as this plot we didn’t hit the affirm. Let’s tweak the affirm
to one thing unfaithful though simply to test that every little thing is working as anticipated

// ...
affirm(y == 12);
// ...

and then retry update-and-traipse in the REPL

(zig) update-and-traipse
warning: route of aborted abnormally
(zig) 

Hmm, true, so assertion became as soon as precisely resulted in on this case, and the compiler reported that the binary
did no longer exit cleanly, simply as we anticipated.

OK, nonetheless how any of this lend itself in direction of sizzling-code reloading in Zig? Correct, let’s attain one more tweak to the
offer where we replace the definition of bar to one thing longer so that the linker would per chance be compelled to switch
the emblem to a brand original discipline in the file and virtual memory

fn bar() u32 {
    affirm(simply);
    affirm(simply);
    affirm(simply);
    affirm(simply);
    return 10;
}

Let’s update-and-traipse

(zig) update-and-traipse
(zig)

To this point so simply. If we now analyse the earlier than and after of the update-and-traipse step, we are able to expose that bar became as soon as moved
from its preliminary address in virtual memory of 0x1000010c0 to 0x100001178 because it grew too sizable to be accomodated
in its original dwelling. I’ll live right here for a second and pull up a “printout” from a debugging tool 5 I wrote to
abet in visualising modifications to the binary between incremental updates

bar-moved

There are two columns in the image: the left hand side depicts the contents of the virtual memory sooner than the next
incremental update, and the true hand side depicts the contents after the incremental update. I in actual fact enjoy purposefully
highlighted the emblem bar which, as predicted, has been moved in memory from 0x1000010c0 to 0x100001178
because it grew too sizable to fit its most original placeholder (NB Zig’s incremental MachO linker does insert some
padding between symbols so that they might be able to grow without necessitating the switch, nonetheless on this case, we purposefully
grew the contents of bar ample to trigger the switch and reallocation in virtual memory
).

But what about any caller of bar? Did any symbol calling bar want a chubby rewrite? The short resolution is now not any. Why,
you query? Let’s pull up one more gaze of the modifications to the virtual memory contents of the file between updates, and
in explicit, let us zoom in on addToBar

addtobar-unchanged

The contents of the highlighted addToBar depicts any relocation to any a host of symbol at some stage in the binary image.
Repeat that addToBar doesn’t plot a right this moment reference to bar; as a change, it references a mysterious cell in the
global offset desk (GOT) denoted right here as fragment __DATA_CONST,__got. The cell is located at an address 0x100054028.
Let’s pull up its contents in both views

bar-got-cell

Repeat that both cells in both views tranquil masks bar nonetheless the cell on the left hand side aspects to bar
at its original address of 0x1000010c0, whereas the cell on the true hand side to its original address after the switch,
0x100001178. In a host of words, in content in confidence to retain the integrity of the calls, the total linker needed to tweak
became as soon as to update the aim address of bar in its GOT cell. There became as soon as no want to touch any a host of symbol which
called bar as every reference to it’s miles carried out by strategy of the GOT desk. This mechanism lends itself undoubtedly successfully
to sizzling-code reloading as it minimises the number of modifications the linker has to achieve to the binary, and this can even simply
be the cornerstone for our sizzling-code reloading resolution. Let’s win true to it then!

…to sizzling-code reloading…

Before we hurry on, I’ll level out that in the the relaxation of this submit, I’ll essentially address Mach and macOS particular
bits to win the ball rolling with admire to sizzling-code reloading with the Zig compiler. One additional bit
required to in actual fact win all of it pieced together correct into a working resolution is to roll out some mechanism for talking
with the compiler whereas in the original-code reloading mode as talking by strategy of stdio would per chance be unavailable as we are able to be
piping the output of the original-code reloaded child route of (our binary) by strategy of the compiler. Because of the this fact, one would per chance well also for
event talk by strategy of a socket, and right here is precisely how both Andrew Kelley 3 has carried out in his Linux proof-of-notion
and I in actual fact enjoy carried out in my macOS proof-of-notion. Anyways, in what follows, I’ll interact we already enjoy the main infrastructure
to achieve this.

I would per chance well also simply tranquil additionally level out that whereas which it’s possible you’ll bask in to browse, and extra importantly, play with a working
model of the Zig compiler with sizzling-code reloading enabled on macOS, yow will stumble on the linked offer code
in hcs-macos branch in the foremost Zig’s repo on GitHub 6.

First issues first, we would prefer to flip off address home layout randomisation (ASLR) for the newborn route of (NB right here is
in actual fact no longer simply, and we are able to efficiently set apart sizzling-code reloading with ASLR on. In the event you’re concerned on this
bit, scroll down to the next fragment
). On macOS, to achieve this from the actual person home, we would prefer to utilise the posix_spawn
family of capabilities for spawning and executing a child route of. In explicit, we’re concerned on this feature

int
posix_spawnp(pid_t *restrict, const char *restrict file, const posix_spawn_file_actions_t *file_actions,
             const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]);

In content in confidence to query ASLR off from the OS, we would prefer to lunge an attribute object posix_spawnattr_t with
flags containing _POSIX_SPAWN_DISABLE_ASLR=0x100. In Zig, I in actual fact enjoy made obvious there are fine wrappers for
this, so all we would prefer to achieve is

const std = @import("std");
const darwin = std.os.darwin;
const posix_spawn = std.os.posix_spawn;

// ...

var attr = strive posix_spawn.Attr.init();
defer attr.deinit();
const flags:  u16 = darwin.POSIX_SPAWN_SETSIGDEF | darwin.POSIX_SPAWN_SETSIGMASK | darwin._POSIX_SPAWN_DISABLE_ASLR;
strive attr.dwelling(flags);

const pid = strive posix_spawnp(exe_path, null, attr, null, null);

(NB in actual fact, since a few days in the past, you don’t even want to make use of the posix_spawn
primitives the least bit, as spawning a child route of on macOS will by default use this mechanism for you.
)

And that’s it, the newborn route of will now be build at its static addressing (if imaginable of route).

Having spawned the route of and obtained its PID, we are able to now use it to begin a Mach port to the newborn route of
which we are able to then use to update the newborn route of’ memory, build a question to relating to the putrid address where it became as soon as mapped to, and so on.
In content in confidence to begin the Mach port, we would prefer to make use of this Mach kernel feature

kern_return_t
task_for_pid(mach_port_name_t target_tport, pid_t pid, mach_port_name_t *t);

The first argument is the Mach port take care of for the parent route of, and it would per chance well even be obtained by strategy of an extern global variable

extern mach_port_t mach_task_self_;

the center argument is the PID of the spawned child route of, and the final argument is the receiver for the take care of
to the opened Mach port. There’s one caveat to the use of this (very) low-level API: it requires elevated privileges. This
is horny for the applications of our proof-of-notion nonetheless positively a no-hurry for manufacturing use of sizzling-code reloading mode.
One manner of overcoming this, though please show I haven’t carried out numerous investigation on it yet, is to bake in
the main entitlements into the compiler binary itself 7. To essentially the most simple of my recordsdata, right here is how LLVM’s
lldb does it too. On the topic of debuggers, I am no longer obvious whereas you seen yet, nonetheless the original-code reloading plot
I listing right here is a detailed cousin of how debuggers work, and whereas you are uncommon about how a host of bits come together,
I invite you to gaze lldb’s offer code which is openly readily available 8.

As with posix_spawn, I in actual fact enjoy created a few wrappers spherical Mach ports, and so the above route of boils down to

const std = @import("std");
const darwin = std.os.darwin;

var process = strive darwin.machTaskForPid(pid);

Having obtained a take care of to the Mach port for talking with the newborn route of, we are able to now flip to
where the specific magic is going on: the linker. At any time when the linker is requested to set apart in-dwelling binary
patching as we analysed in the first fragment of this weblog submit, whereas writing the updated symbol to file, we
use the obtained Mach port to at the identical time write the updated contents right this moment to mapped memory of the running
route of. We would per chance also simply tranquil be cautious though, as writing to the executable phase will in most cases fail except we purposefully
replace essentially the most original security attributes of the phase. Thus, the manner is as follows:

take a look at if the phase is writable

if sure, simply write to it an offset

if no longer

immediate dwelling the security attributes to enable writing to it

write to the phase at an offset

reset the security attributes to their original level

Since we management the linker, we attain no longer want to actively take a look at if the phase is writable as we management its
security attributes. In content in confidence to change the mapped phase’s attributes, we would prefer to make use of the kernel feature

kern_return_t
mach_vm_protect(vm_map_t target_task, mach_vm_address_t address, mach_vm_size_t dimension, 
                boolean_t set_maximum, vm_prot_t new_protection);

As sooner than, Zig aspects fine abstraction for this, so this could even be carried out as follows the use of the obtained process by strategy of
machTaskForPid(pid: pid_t) MachError!MachTask

strive process.setCurrProtection(addr, len, std.c.PROT.READ | std.c.PROT.WRITE | std.c.PROT.COPY);

Explore the funky taking a appreciate security flag std.c.PROT.COPY=@as(vm_prot_t, 0x10) which is outlined as VM_PROT_COPY
in Apple’s libc. According to the definition, this flag would per chance well even be worn to power query write permissions on a mapped
entry. Environment this flag marks the mapped entry as “desiring reproduction” and successfully copying the article the use of reproduction-on-write
mechanics, and adding VM_PROT_WRITE permission to essentially the most protections for the linked entry (NB an simply reader
would per chance well also wonder why bother with updating the security if we management the linker and would per chance well dwelling the phase’s preliminary
and most security attributes to be writable. Effectively, as it turns out, on arm64 macOS, the VM_PROT_WRITE security
permission is rarely any longer respected for the executable phase, as a result of this fact, there is now not this form of thing as a a host of manner of attaining this than with
the use of VM_PROT_COPY flag.
).

Writing correct into a mapped memory discipline is somewhat easy, and would per chance well even be carried out with the following kernel
feature

kern_return_t
mach_vm_write(vm_map_t target_task, mach_vm_address_t address, vm_offset_t recordsdata, mach_msg_type_number_t data_cnt);

In Zig, this turns into

const nwritten = strive process.writeMem(addr, &buf, .aarch64);

Placing all of it together, we got a routine that appears to be like plot of bask in this

const sym:  nlist_64 = //...
var buf:  [LEN]u8 = //...

if (!seg.isWriteable()) {
    strive process.setCurrProtection(sym.n_value, &buf, .aarch64);
}
defer if (!seg.isWriteable()) {
    process.setCurrProtection(sym.n_value, &buf, .aarch64) preserve {};
}
// Here, we would resolve relocations (if any)
strive resolveRelocs(sym.n_value);
const nwritten = strive process.writeMem(sym.n_value, &buf, .aarch64);

That’s gorgeous unprecedented it!

Now right here’s an bright bonus query: can this be refrained from disabling ASLR? The retort is sure!

…with ASLR on!

I mean, if the debuggers can attain it, so can we, true? With the ASLR attend in the image, we would prefer to enlighten an additional
kernel name to put a question to relating to the putrid address for the mapped binary image. To realize this, we would prefer the following feature

kern_return_t
mach_vm_region_recurse(vm_map_t target_task, mach_vm_address_t *address, mach_vm_size_t *dimension,
                       natural_t *nesting_depth, vm_region_recurse_info_t recordsdata, mach_msg_type_number_t *info_cnt);

Repeat that the variable address is passed by pointer. Here is because after the name completes, address will win
the price of the putrid address for the mapped image. We can then use this worth to calculate the specified trail
worth for each and every non-PC-relative relocation. In a host of words, with this we’re successfully turning our linker into
a dynamic linker!

Taking our snippet from above, we are able to find yourself with one thing bask in this

const pagezero_vmsize:  u64 = 0x100000000;
const sym:  nlist_64 = //...
var buf:  [LEN]u8 = //...

const trail:  u64 = trail:  {
    const recordsdata = strive process.getRegionSubmapInfo(sym.n_value, buf.len, 0, .short);
    const trail = recordsdata.base_addr - pagezero_vmsize;
    ruin : trail trail;
};
if (!seg.isWriteable()) {
    strive process.setCurrProtection(sym.n_value + trail, &buf, .aarch64);
}
defer if (!seg.isWriteable()) {
    process.setCurrProtection(sym.n_value + trail, &buf, .aarch64) preserve {};
}
// Here, we would resolve relocations (if any)
// For any non-PC-relative pointer worth, resolve and trail
strive resolveRelocs(sym.n_value, trail);
const nwritten = strive process.writeMem(sym.n_value + trail, &buf, .aarch64);

Repeat that we subtract the scale of __PAGEZERO phase from the returned putrid address to win the trail worth. Then,
for any relocation that is non-PC-relative and is a pointer, we relocate the pointer worth and add the trail worth.
Here is solely like what dyld would attain for all rebase opcodes encoded as phase of the “rebase recordsdata” subsection of
LC_DYLD_INFO_ONLY load make clear.

Demo!

Demo captured on M1 MacBook Air, macOS 12.2.1, most original Zig self-hosted compiler with patch from hcs-macos branch. 6

References

Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching