Rust’s Unsafe Pointer Forms Need an Overhaul

103
Rust’s Unsafe Pointer Forms Need an Overhaul

Aria Beingessner

March 19th, 2022

I imagine unsafe pointers in Rust a lot.

I literally wrote the ebook on unsafe Rust. And the ebook on pointers in Rust. And redesigned the Rust’s pointer APIs. And designed the usual library’s abstraction for unsafe heap-allocated buffers. And withhold the different Vec structure.

I imagine unsafe pointers in Rust a lot, and I entirely abominate them.

Don’t safe me harmful, I judge all of the above work has made them higher but they’re quiet deeply fallacious. Without a doubt they’ve gotten a lot worse. No longer since the APIs possess changed, but due to the when I was once working on these items we had too naive of an figuring out of how pointers would perhaps well simply quiet work. Others possess accomplished a form of mammoth work to enhance this figuring out, and now the failings are your whole more evident.

This article is broken up into 3 parts: conceptual background, problems with the fresh invent, and proposed alternate options.

This share will seemingly be skipped completely ought to you know all the pieces about computers.

Aliasing is a obligatory theory in compilers and language semantics. At a excessive-level, it is the recognize of the observability of modifications to memory. We name it aliasing since the difficulty is amazingly easy unless it is doubtless you’ll well well seemingly talk over with a portion of memory in bigger than a technique. Pointers are honest nicknames for memory.

The principle feature of alias diagnosis is as a model for when the compiler can semantically cache memory accesses. This will well simply either point out assuming a price in memory hasn’t been modified or assuming a write to memory isn’t valuable. Right here’s exceptionally important due to the in point of fact all program direct is semantically in memory. It is literally no longer doubtless for a long-established motive programming language that does the relaxation on the behalf of the programmer to enable arbitrary reads and writes to memory.

As a confidently extremely evident example that we can all agree on, the compiler would perhaps well simply quiet be ready to resolve on that the next program will scuttle 1 to println!:

let mut x = 0;
let mut y = 1;


x = 2;

println!("{y}");

When we discuss alias diagnosis we most often soar as we order to pointers due to the that’s the laborious phase but esteem, the truth that this has deterministic behaviour is phase of your aliasing model! Variables are semantically unaliased unless you without a doubt win a reference to them!

Right here’s in the end a foundational assumption for putting things in registers, due to the putting something in a register is caching it. If a compiler can’t resolve it’s ok to put values in long-established motive registers or spill them to the stack, it’s an assembler at finest. We’d esteem to create languages which are higher-level than an assembler!

Right here’s all neatly and genuine unless we introduce pointers and want to commence up answering more sturdy questions. As an illustration, in the next feature, will we resolve on that input and output talk over with a form of regions of memory?

fn compute(input: &u32, output: &mut u32) {
    if *input > 10 {
        *output = 1;
    }
    if *input > 5 {
        *output *= 2;
    }
}

If we can, then the compiler is free to rewrite it as follows:

fn compute(input: &u32, output: &mut u32) {
    let cached_input = *input; 
    if cached_input > 10 {
        
        
        
        
        
        *output = 2;
    } else if cached_input > 5 {
        *output *= 2;
    }
}

Within the occasion that they end thunder the same memory, then the write *output=1 and the learn *input> 5 would alias. When we originate (potentially) aliasing accesses, the compiler has to conservatively load and retailer from memory as worthy as the source code implies.

Now it’s most often clumsy to discuss accesses aliasing, so we most often discuss pointers aliasing as a shorthand. So one would moderately inform that input and output are aliased. The explanation that the genuine model is in phrases of accesses and no longer pointers is due to the that’s the ingredient that we care about. We don’t in the end care ought to you scuttle in two pointers that alias but finest utilize one. In an identical contrivance we don’t in the end care if two pointers alias but are finest prone for reads – one learn can’t observably end the a form of learn (this assumption is why memory mapped hardware has to utilize volatile)

Right here’s also why Rust has this kind of particular schism between “habitual mutable” references (&mut) and “shared immutable” references (&). It’s pleasing to manufacture as many copies as you esteem to possess of pointers which possess readonly safe admission to, but ought to you esteem to deserve to without a doubt mutate some memory it’s without a doubt important that it isn’t aliased!

(You might well well seemingly also simply behold that that is a simplified model fleshy of lies, ought to you would esteem less lies, learn my extremely detailed dialogue of Stacked Borrows.)

Now with all that said, I’m going to utilize the next shorthands:

  • memory is nameless if the programmer cannot talk over with it by name or pointer.
  • memory is unaliased if there would possibly per chance be presently finest one formulation to talk over with it.

Anonymous values are in some sense “entirely beneath the control of the compiler” and would perhaps well subsequently be freely assumed to be unaliased and depended on/modified by the compiler. Unaliased memory cannot be “randomly” modified by something reputedly “unrelated” (we’ll safe to what that contrivance in the subsequent share).

Languages can possess stricter or weaker aliasing devices. A stricter model permits the compiler to entire more optimizations but puts heavier restrictions on what the programmer is allowed to entire in some unspecified time in the future of the confines of the language. Listed below are some standard principles, in vaguely rising strictness:

  • Callee-saved values pushed to the stack are nameless (return pointer, body pointer).
  • “Scratch” values the compiler spills to the stack are nameless.
  • A newly declared variable is unaliased unless a reference is taken to it.
  • The memory returned by malloc is unaliased.
  • Fields of a struct end no longer alias eachother (bitfields are made of disappointment).
  • Padding bytes are vaguely nameless (messy because of memcpy/memset/unions/punning).
  • Immutable variables are functionally unaliased in that they would possibly be able to never trade values.
  • In Rust, &mut is unaliased (Stacked Borrows).
  • In C(++), T* and U* cannot alias if T!=U and neither is char (Strict Aliasing).

(I cannot emphasize ample how shorthanded all of that is, the devil is amazingly in the important aspects and formally specifying these objects in this discipline of untold numbers of PhD theses. I am no longer making an strive to write down a PhD thesis simply now. Until you literally work on a C/C++ Customary Committee or are named Ralf Jung I’m going to no longer be accepting your Umm Without a doubt’s on these definitions and phrases.)

1.2 Alias Analysis and Pointer Provenance

Good ample so all of that aliasing stuff is all neatly and genuine, but as soon as you win a pointer to something, or copy a pointer, or offset a pointer… that each and every goes out the window, simply? Equivalent to you want to resolve on the relaxation will seemingly be aliased by the relaxation?

No! Aliasing principles are a number of of basically the most foundational parts of the language’s semantics and optimizations. Whenever you drag afoul of the language’s aliasing principles you possess Undefined Behaviour and the miscompilations will seemingly be extremely brutal!

But yes, while you commence up faffing around with pointers things safe a lot more sturdy for the compiler, memory model, and programmer. To fabricate aliasing a gracious concept once pointers commence up getting thrown around, memory devices in a brief time safe the necessity to account for two ideas:

  • Allocations
  • Pointer Provenance

Allocations abstractly represent things esteem particular person variables and the heap allocations. A freshly created allocation (variable decl, malloc) is all the time brought into the realm unaliased and subsequently acts esteem a sandbox with One Appropriate Name – there would possibly per chance be not this kind of thing as a model to safe admission to the memory in the sandbox rather than thru the One Appropriate Name (that isn’t Undefined Behaviour).

Pointer Provenance describes the manner “permission to safe admission to an allocation’s sandbox” will seemingly be delegated from the One Appropriate Name by deriving a fresh pointer from it or something derived from it. The approach of monitoring this “chain of custody” from the One Appropriate Name to all of its derived pointers is Pointer Provenance.

From a formal memory model perspective, all accesses to an allocation will deserve to possess provenance monitoring reduction to that allocation. If pointer provenance isn’t elated, then that contrivance the programmer broke out of the sandbox or pulled a pointer from the aether that took space to point into some random sandbox. Either contrivance, all the pieces is chaos and nothing is ideal anymore if that’s allowed.

From a compiler optimization perspective, monitoring provenance permits the compiler to fresh that two accesses don’t alias. If it ever loses observe of provenance (i.e. if a pointer is passed to an opaque feature) then it honest conservatively assumes they end alias. But when the compiler neverloses observe of your whole derived suggestions to an allocation, it might possess very finest aliasing info and end Some Resplendent Codegen.

Right here’s the fundamental trick compilers squawk to all most often no longer doubtless problems: possess a in point of fact easy diagnosis that can acknowledge your quiz with “YES”, “NO”, or “MAYBE” and then convert “MAYBE” to “YES” or “NO” in accordance with whichever one is safer. Attain these two accesses MAYBE alias? Then YES they alias. Narrate Solved.

But while you safe low-level ample the compiler needs you to reduction it out and in the end squawk some dang principles. Right here’s why the llvm GetElementPointer (GEP) instruction which computes a pointer offset, is kind of all the time emitted by compilers with the inbounds key phrase. The inbounds key phrase is in the end “I promise this offset acquired’t smash the pointer out of its allocation sandbox and entirely trash aliasing and provenance”. Which esteem, yeah your whole pointer offsets would perhaps well simply quiet squawk that rule!

Let’s scuttle up a level and take into story Rust: any time you end (*ptr).my_field the compiler will emit GEP inbounds. Enjoy you ever ever puzzled why the documentation for ptr::offset and friends is so habitual and subtle? Because they lower to GEP inbounds and esteem to squawk its principles! ptr::wrapping_offset is honest GEP without the inbounds. And even wrapping_offset isn’t in the end allowed to interrupt provenance:

When compared to offset, this methodology most often delays the requirement of staying in some unspecified time in the future of the same allocated object: offset is instantaneous Undefined Habits when crossing object boundaries; wrapping_offset produces a pointer but quiet outcomes in Undefined Habits if a pointer is dereferenced when it is out-of-bounds of the object it is linked to

Mea culpa, I spent years calling CHERI entirely unshippable vaporware! I was once fairly assured, but I’ll utilize my hat due to the ARM Morello in the end constructed and shipped a fleshy CHERI-based entirely entirely CPU. Congratulations to all americans who worked on it!

So what’s CHERI? I’m no longer going to safe into the nitty-gritty important aspects but roughly speaking it’s a 128-bit structure. Neatly in the end it’s 129-bit. Neatly in the end it’s 64-bit. Sorry what?

Good ample so your whole Knowing with CHERI is that it in the end reifies and implements the “sandboxing” model from the old share. Each pointer is tagged with extra metadata that the hardware maintains and validates. Whenever you ever safe away of your sandbox the hardware will win it and the OS will presumably assassinate your route of.

I don’t know the fleshy important aspects of the encoding or metadata, however the phase we care about is that each and every pointer contains a compressed reduce (differ of memory) that it is allowed to point into as neatly as the genuine tackle that it aspects to. The reduce is that pointer’s sandbox, and all pointers derived from it inherit that sandbox (or less). Everytime you safe admission to some memory, the hardware honest tests that the pointer is quiet inner its sandbox.

This metadata isn’t low value: pointers in CHERI are 128-bits extensive, however the tremendous tackle dwelling is quiet at most 64-bit (I don’t know the particular upper scuttle, all that matters is that addresses match in 64-bit). Now 128-bit is without a doubt bloated, so in C(++) CHERI in the end will get aid from our mature nemesis The Wobbly C Interger Hierarchy.

C makes a distinction between intptr_t (“a pointer-sized integer”) and ptrdiff_t/size_t (“offset-sized integers”). Beneath CHERI, intptr_t is 128-bit and ptrdiff_t/size_t are 64-bit. It’ll end this since the tackle dwelling is quiet finest 64-bit, so the relaxation that refers to offsets or sizes can quiet be 64-bit.

Good ample so that it is doubtless you’ll well well seemingly also need two burning questions at this point: how on earth can this doubtless work if I’m in a position to honest scribble over a pointer and injurious its metadata, and why did you inform it’s in the end 129-bit. Because it turns out, these are the same quiz!

I safe solutions to conceptualize that is to imagine it esteem ECC (Error Correction Code) RAM. In ECC RAM, each and every RAM stick in the end has more bodily memory than it claims, due to the it’s transparently the utilize of that extra memory to easily or detect random bitflips. So there’s all this extra memory somewhere but as a ways as a compiler or programmer are enthusiastic, the memory appears to be like perfectly long-established and doesn’t possess any habitual extra bits.

CHERI does the same ingredient, however the extra 129th bit the hardware is hiding from you is a “metadata is reputable” bit. You search, to nicely manipulate pointers in CHERI you want to safe admission to the memory/registers containing a pointer with specific instructions for that task. Whenever you strive to manipulate them some a form of contrivance (inform by memcopying random bytes over it), the hardware will disable the “metadata is reputable” bit. Then ought to you strive to utilize the pointer as a pointer, the hardware will search your metadata can’t be depended on and fault/assassinate your route of.

It’s friggin’ neato!

(Quite a bit of the factors we’ll search with integrating Rust with CHERI will in the end look a lot esteem factors with segmented architectures, but I without a doubt possess never prone these so I would perhaps well honest be vaguely gesturing at them and handwaving. Appropriate take into story that on every occasion I talk over with CHERI, a an identical argument also potentially applies to segmenting. So ought to you care about segmented architectures, it is doubtless you’ll well well seemingly also care about CHERI too!)

Now let’s search how Rust’s fresh unsafe pointer APIs space off problems in your whole background we’ve considered above.

Rust presently says this code is neato and pleasing:


let mut addr = my_ptr as usize;
addr = addr & !0x1; 
let new_ptr = addr as *mut T;
*new_ptr += 10;

Right here’s some fairly bathroom-standard code for messing with tagged pointers, what’s harmful with that?

Take into story the background we honest talked about. Take into story Pointer Provenance. Take into story CHERI.

🙀 aAAaAAaaaAAaAAAAAA 🙀

For this to seemingly work with Pointer Provenance and Alias Analysis, that stuff must pervasively infect all integers on the concept that that they would perhaps well be pointers. Right here’s a mammoth distress in the neck for of us who’re making an strive to without a doubt formally account for Rust’s memory model, and for of us who’re making an strive to create sanitizers for Rust that win UB. (And I guarantee you it’s honest as worthy a headache in your whole LLVM and C(++) of us too).

For this to seemingly work with CHERI we now deserve to manufacture usize 128-bit (despite the truth that the tackle dwelling is 64-bit) and all the time manipulate it with “pointer instructions” on the concept that that it would perhaps well be a pointer in the vein of intptr_t. Yes folks possess tried working Rust beneath CHERI and that’s exactly what that they had to entire. It was once, No longer Resplendent.

Unfortunately for CHERI, Rust in the end defines usize to be the same dimension as a pointer, despite the truth that its main role is to be an offset/dimension. Right here’s a very affordable assumption for mainstream platforms, but it absolutely runs afoul of CHERI (and segmented architectures)!

Whenever you don’t manufacture usize 128-bit and honest strive to manufacture it the 64-bit “tackle” part, then usize as *mut T is a in point of fact incoherent operation. Promoting an integer to a pointer (or what CHERI calls a skill) requires at the side of metadata to it. What metadata? What differ is this random tackle seemingly reputable for? There would possibly per chance be literally no formulation to answer that quiz!

Now it is doubtless you’ll well well seemingly also be pondering “okay but pointer tagging is a neatly-organized major ingredient, are you asserting we can’t end that anymore?”. Nope! You are going to be ready to entirely quiet end tagging tricks, but you ought to be a miniature more cautious about it. Right here’s why CHERI in the end introduces a a form of operation voidcheri_address_set(voidskill, vaddr_t tackle) which takes a sound pointer and an tackle, and creates a fresh pointer with the same metadata and that tackle. That would possibly per chance be gracious to take into accout!

Hey! That operation also appears to be like without a doubt gracious for provenance, doesn’t it? By associating out int-to-ptr operation with an fresh pointer we’re reestablishing the provenance chain of custody! The fresh pointer is derived from the mature one, and compilers and memory-devices will seemingly be elated! HMMMM…

Within the mammoth mature days of Rust 1.0, we were fairly optimistic about how fast-and-free we would perhaps well simply be with raw pointers. Neatly, okay we were in the end fairly rigorous about pointers by most of us’s requirements. We largely enforced GEP inbounds semantics, alignment, carved out learn the formulation to work with ZSTs, put in power that allocations can’t be bigger than isize::MAX, etc.

But what we played without a doubt fast and free on was once aliasing and validity. In Ye Olde concept of Rust, references were kinda honest conveniences. Admire yes they asserted many things, but no longer in a compiler optimization kinda contrivance. Appropriate in a “this API ensures this” kinda contrivance. We all vaguely knew we wanted the optimization-y stuff but no one had spent the vitality to work that out.

Within the intervening time, between unsafe-code-suggestions, miri and stacked borrows, a form of us possess now put a form of figuring out into this. Miri is basically gracious due to the it lets us “kick the tires” on genuine code and check if the semantics we’re drawn to are in the end obeyed by genuine unsafe Rust code.

They Weren’t! Right here’s why the unsafe queue in Learn Rust With Entirely Too Many Linked Lists all as we order comes to a pause for a 4000 notice delve into miri and stacked borrows! Even something as tedious as a linked queue in Ye Olde Rust had complex and busted semantics.

(For genuine learn that chapter ought to you esteem to deserve to realise stacked borrows, I’m no longer rehashing it right here.)

The major recount is that beneath our trendy figuring out of Rust, even surroundings up a reference is making an extremely sturdy validity assertion and has side-effects on the “borrow stack” which in turn modifications which references are regarded as to invalidate or no longer. For a reference to T this potentially involves:

  • The reference is aligned
  • The reference is non-null
  • The pointed-to-memory is allocated and has at the very least size_of:: () bytes.
  • If T has invalid values, the pointed-to-memory would not possess one.

The upshot of all of that is that on the whole it is doubtless you’ll well well seemingly also simply quiet keep a ways flung from mixing references and unhealthy pointers. Unsafe code would perhaps well simply quiet on the whole provide a win referency-interface at its API boundary, and then internally fall the references and win a examine to possess in unsafe pointer land. This vogue you chop the sturdy assertions you manufacture about your sketchy low-level info structures’ memory.

Good ample, straight forward ample, simply?

2.3 Offsets And Areas Are A Mess

So that you just’re making an strive to be responsible and possess in unsafe pointer land and it’s time to offset a pointer. That’s easy, we now possess ptr::offset/add/sub for that! Let’s honest offset to this struct’s discipline… uh… wait what’s that discipline’s offset?

Oh Rust honest, doesn’t portray me that huh? Neatly seemingly it is doubtless you’ll well well seemingly end something esteem

Oh no wait that made a reference. Yes even ought to you as we order cast it to a raw pointer. How the heck end you win an tackle without surroundings up a reference? Furthermore is it in the end pleasing for me to utilize a reference to initialize uninit memory? Invent of, infrequently.

Right here’s a mammoth complex mess. For a in point of fact long time we tried to possess some more or less “5-2d rule” ingredient the set ought to you converted the reference to a raw pointer “fast ample” then it’s OK but that was once fairly clearly untenable for a formal model. So folks got right here up with a moral RFC for raw addresses and for a extremely very long time we’ve had a hacky addr_of macro that helps you to entire this:

…yeah I abominate it too.

And even that didn’t put the nail in the coffin. There was once no longer too long prior to now a put up by a extremely skilled rust developer that most often amounted to at a loss for words frustration at the fresh scenario with doing these items with uninitialized memory. Meanwhile the genuine ingredient proposed in the RFC has reputedly been stalled out for years since the Experts on these items are themselves at a loss for words by the nook-conditions of addr_of.

And to manufacture it even worse, addr_of also makes it without a doubt laborious to entire a ingredient of us quiet desire which is static offsetof.

It is my assertion that a form of this boils the whole manner down to 2 details:

  • Dereferencing Pointers Is Inaccurate Nonsense
  • Areas Are Extraordinarily Confusing Magic (Rust’s term for lvalues)

Admire if we imagine what “dereferencing” a pointer is… it’s in the end nothing? Admire dereferencing a pointer doesn’t in the end end a ingredient. It puts you in “space expression” mode and helps you to specify an offset to subfields/indices of the pointee, and then what in the end happens is finest specified at the pause. e.g.

(*ptr).field1.field2[idx3].field4;      
(*ptr).field1.field2[idx3].field4 = 5;  
&(*ptr).field1.field2[idx3].field4      

Right here’s without a doubt acquainted syntax but it absolutely’s truly also more or less magical in the particular same contrivance autoderef is in Rust. That is to insist, it makes a more or less sense and honestly you honest don’t deserve to imagine the truth that it’s going down in win Rust code due to the you know the compiler has your reduction and would perhaps well aid out if the relaxation goes harmful. But in unsafe Rust code? This stuff is contrivance too fuzzy. I literally can’t portray if the indexing is ideal into a reduce or an inline array, or if any of these .s is dereferencing stuff implicitly.

As I said earlier than, ought to you’re doing unsafe pointer stuff you esteem to deserve to possess in that mode. That is presently no longer doubtless with this space-expression invent, due to the as soon as you deref you’re kinda in a habitual twilight between win and unhealthy!

Alright and now right here is the set I commence up going off the rails and proposing wild overhauls to Rust with almost no regard for “parsing”.

3.1 Distinguish Pointers And Addresses

The connection between usize and pointers has to be entirely overhauled, and I would perhaps well win a chainsaw to it (the utilize of ethical editions and deprecation classes).

Right here’s the excessive level examine our tasteful chainsawing:

  • Give an explanation for a distinction between a pointer and an tackle
  • Redefine usizes as tackle-sized, which is
  • Define ptr.addr() -> usize and ptr.with_addr(usize) -> ptr solutions
  • Deprecate usize as ptr and ptr as usize

First off, these definitions. A pointer is quiet a pointer as we understand it, but we now acknowledge that it aspects right into a particular tackle dwelling. A pointer also contains an tackle which is conceptually an offset into this tackle dwelling. (For all main architectures and CHERI there would possibly per chance be finest one tackle dwelling as a ways as I’m enthusiastic, but it absolutely’s potentially price opening the door for nicely talking about pointers in segmented architectures right here.)

A usize is neatly-organized ample to possess all addresses for all tackle areas on that platform. For main architectures, that contrivance a usize is quiet pointer-sized. For CHERI, that contrivance usize can (and will) be a u64 and is such as CHERI’s vaddr_t. (Once more confidently this generic definition is gracious for segmenting but I’m no longer going to faux I know that.)

As a consequence, somebody writing maximally transportable Rust must now replace their assumptions:

size_of:: ()==size_of:: ()

is now:

size_of:: () ()

There would perhaps well simply quiet potentially be a cfg(target_address_size_is_pointer_size) or something to enable of us to specify instrument is incompatible with habitual platforms the set the strict equality doesn’t possess.

Next off, replacing casts with solutions. The following fresh solutions would be added:


implT: ?Sized> *mut T {
    
    
    
    
    
    
    
    
    fn addr(self) -> usize;

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    fn with_addr(self, addr: usize) -> Self;
}

Deprecating the casts would perhaps well simply seem uncouth, but as a ways as I’m enthusiastic that is the particular same scenario as when we deprecated mem::uninitialized. The invent of these casts is basically broken beneath each and every Pointer Provenance and CHERI. All americans needs to utilize a more in-depth invent that in the end has a coherent which contrivance.

Now technically it is doubtless you’ll well well seemingly also possess ptr as usize but I judge it’s higher to exchange each and every for several causes:

  • Getting a deprecation warning for each and every side of the forged raises a mammoth purple flag to all americans doing the relaxation even vaguely dubious with a usize that they’ve some pondering to entire.
  • You are going to be ready to’t dangle documentation off of casts. As I hope this put up demonstrates, int-ptr stuff is amazingly subtle and hairy, and deserves a form of detailed documentation!
  • ptr as usize is horribly clunky in squawk so destroying it is honestly a mercy.
  • Straight-up symmetry/aesthetics. It’s habitual to finest possess one!

As the documentation notes, the fresh with_addr methodology serves several roles:

  • It lets us reconstitute what segment the tackle goes to (confidently)
  • It lets us reconstitute provenance for the purposes of memory devices / alias diagnosis
  • It lets us reconstitute metadata for the purposes of CHERI (but that is honest a reification of provenance)

…that it! It honest fixes the difficulty. That’s all you want to repair provenance and CHERI! (And most certainly also enhance segmenting.)

(In fact there would perhaps well deserve to be some more special APIs added to meet the fresh Jank uses of ptr-int conversions, but that in the end has to be shaken out on crates.io and with the community.)

Unclear ingredient: is get_addr/with_addr also valuable/gracious for ARMv8.3 Pointer Authentication? Right here’s a expertise that Apple ships and involves some pointers getting signed/obfuscated to manufacture it a miniature more sturdy to entire memory-safety exploits. I haven’t looked into it ample to know what level of abstraction this “leaks” into. I honest know about it due to the it shows up in minidumps and we now deserve to hackily strive to strip it out.

3.2 Fixing Areas and Offsets

Good ample right here’s a two-phase combo of syntactic niceties to manufacture it a lot simpler to work with unsafe pointers.

Hey Did You Know Rust Never Without a doubt Eradicated Tilde (~) From The Syntax?

Did you furthermore mght know that ~ was once one in everything that in the origin drew me to messing around with Rust reduction in esteem 2014, and that I was once very unhappy to learn that it was once already being eliminated?

Neatly lately I safe my justice. These days I return ~ to its intention of glory that’s deserved.

I am almost sure that I’m going to drag afoul of parser ambiguities somewhere right here but hi there this isn’t a genuine RFC and I safe to manufacture the principles. Doubtless it without a doubt works pleasing. Doubtless it’d be fastened in an edition. Let’s discover!

Right here is my mammoth imaginative and prescient that will clear up all of Rust’s woes around “staying in unsafe pointer mode” and honest on the whole going thru offsets: Whenever you write ~ as an alternative of . all of it the time does a raw pointer offset.

That’s it. Good ample that’s no longer honest it but that’s most often your whole concept. Let’s commence up with some examples:


struct MyType {
    field1: bool,
    field2: Vec,
    field3: [u32; 4],
}


let init = unsafe {
    let mut uninit = MaybeUninit:: MyType>::uninit();
    let ptr = uninit.as_mut_ptr();

    ptr~field1.write(simply);
    ptr~field2.write(vec![]);
    ptr~field3~[0].write(7);
    ptr~field3~[1].write(2);
    ptr~field3~[2].write(12);
    ptr~field3~[3].write(88);

    uninit.assume_init();
};


unsafe {
    const MY_FIELD_OFFSET: usize = MyType~field1~[2];
    
    
    let mut uninit = MaybeUninit:: MyType>::uninit();
    let ptr = uninit.as_mut_ptr() as *mut u8;

    
    let field_ptr = ptr.wrapping_add(MY_FIELD_OFFSET) as *mut u32;
}

Yes ~[idx] is a miniature wonky but it absolutely’s sure and concise and that’s the supreme ingredient. Point to that that is what you would deserve to entire in lately’s Rust

let init = unsafe {
    let mut uninit = MaybeUninit:: MyType>::uninit();
    let ptr = uninit.as_mut_ptr();

    addr_of!((*ptr).field1).write(simply);
    addr_of!((*ptr).field2).write(vec![]);
    addr_of!((*ptr).field3[0]).write(7);
    addr_of!((*ptr).field3[1]).write(2);
    addr_of!((*ptr).field3[2]).write(12);
    addr_of!((*ptr).field3[3]).write(88);

    uninit.assume_init();
};

Or ought to you’re being artful and making an strive to leverage the truth that POD varieties will seemingly be initialized without write:

let init = unsafe {
    let mut uninit = MaybeUninit:: MyType>::uninit();
    let ptr = uninit.as_mut_ptr();

    (*ptr).field1 = simply;
    addr_of!((*ptr).field2).write(vec![]);
    (*ptr).field3[0] = 7;
    (*ptr).field3[1] = 2;
    (*ptr).field3[2] = 12;
    (*ptr).field3[3] = 88;

    uninit.assume_init();
};

What I without a doubt esteem about the ~ version is that:

  • It’s all postfix honest esteem we learned is Very Resplendent And Effective with .live up for! Appropriate examine how defective addr_of!((*ptr).field2).write(vec![]) is!

  • Because you possess in raw-pointer mode, it’s worthy less painful to reach for things esteem the learn and write solutions. This makes it honest as easy to entire more subtle things esteem read_volatile and doesn’t reduction you to be “artful” and lean on the truth that things occur to be POD. All writes is a extremely tremendous more or less mindlessly simply.

  • You never deserve to misfortune about by chance tripping over autoderef or any a form of ingredient that’s candy for win code but a mammoth hazard for unsafe code.

I am less the const offsetof stuff but it absolutely looks esteem it would perhaps well simply be made to work and I know of us without a doubt desire that stuff.

Additional notes:

  • It’ll also simply quiet propagate *const vs *mut
  • I don’t know whether it is ideal or is a genuine concept but seemingly it is doubtless you’ll well well seemingly also utilize ~ on genuine references and non-references (int, struct, array, tuple…) and no longer honest raw pointers?
    • If that is the case, it would perhaps well simply be tidy to enhance val~self as a nicer postfix formulation to write down &mut val as *mut _ but that is messy ought to you enhance each and every references and non-references since that is seemingly to be potentially ambiguous as to whether you esteem to possess a raw-pointer-to-ref or honest raw-pointer.

Right here’s the least neatly-formed concept in right here, but as long as we’re cleansing up raw pointers with tremendous and orderly postfix syntax it’d be without a doubt tremendous if we also had postfix deref. Point to that because of your whole “surroundings up a reference does magic assertions” ingredient you can’t provide something esteem a deref(self) -> &T methodology that in the end has the same semantics as *ptr! Dereferencing raw pointers must possess first-class syntax.

Attach in mind to illustrate making an strive to safe admission to some multiply-indirected value.

These days:

addr_of!((*(*(*ptr1).field1.ptr2).ptr3).field4).learn()

With offset syntax:

(*(*ptr1~field1~ptr2)~ptr3)~field4.learn()

With offset syntax and reads (take into accout, we’re staying in pointer mode, so ptr2.learn() is loading ptr2 from memory, so we can in the end possess honest the utilize of ~ syntax):

ptr1~field1~ptr2.learn()~ptr3.learn()~field4.learn()

With postfix deref:


ptr1~field1~ptr2.*~ptr3.*~field4.learn()


ptr1~field1~ptr2~*~ptr3~*~field4.learn()

(I judge I independently got right here up with this .* syntax but I was once very elated to learn that Zig in the end has it, and ptr.+=1 is reputable Zig syntax!)

The paradox recount with .* in Rust is that 1. is a sound drift literal, so stuff esteem 1. 1. is reputable syntax and dangerously shut to 1.*.1. I without a doubt can imagine defective ambiguities!

One tremendous ingredient about the ~ syntax is that due to the unary * already binds fairly weakly (which is why we now deserve to entire (*ptr).discipline), ought to you finest deserve to entire one deref, it’s as orderly as the utilize of long-established references:

Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching