rand() could just call malloc()

We recently discovered a weird bug deep inside the latest version of the Thingsquare IoT platform that took us by surprise. The root cause turned out to be that the rand() function called the malloc() function. We were surprised by this. And now we want to spread the word, so that everyone knows: rand() may – under…

56
rand() could just call malloc()

These addons are fine!!

We just no longer too long within the past came across a irregular computer virus deep within the most up-to-date version of the Thingsquare IoT platform that took us by shock.

The root motive turned out to be that the rand() characteristic called the malloc() characteristic. We had been bowled over by this.

And now we’re trying to spread the discover, so as that everyone is conscious of: rand() could just – below sure stipulations – call malloc().

What does this mean to the usual person?

To most folk, no longer noteworthy.

If fact be told, no longer even most system builders will be plagued by this.

But it no doubt affected us.

What we’re talking about here is terribly low-stage. Very bare-metallic. Formulation more bare-metallic than most system builders will ever be.

But let’s dive into it.


The Thingsquare IoT platform runs on devices that bear ridiculously little amounts of memory.

Memory allocation is a discipline

The Thingsquare IoT platform runs on devices that bear little amounts of memory.

Very little.

Each and every infrequently as little as 10 kilobytes. And typically we’ll earn a device to splurge with as noteworthy as 32 kilobytes.

For most fresh applications, here’s ridiculously little.

We therefore ought to quiet be very artful in how we prepare this memory.

Within the Thingsquare machine, we now bear three totally different forms of memory allocation mechanisms:

  • memb: the memory block allocator. Mounted-dimension blocks from static pools of blocks.
  • mmem: the managed memory allocator. Dynamic-sized blocks, that would be rearranged to dwell some distance from fragmentation, from a static chunk of memory.
  • bmem: the block memory heap allocator. Dynamic-sized blocks from a static chunk of memory.

And, since the machine is written within the C programming language, we also exercise stack memory.

The total above are in line with one thought:

Continuously pre-allocate all the things at assemble-time.

This means that we know, beforehand, how noteworthy memory is being broken-down. If we exercise too noteworthy, we won’t even be ready to assemble the code. Powerful less flee it.

What we discontinue up with is a memory structure that looks devour this:

The stack starts on the discontinue of the memory and grows downwards. But we ought to quiet be cautious – more about this below.

The stack is followed by a bunch of memb, mmem, and bmem blocks. As successfully as totally different data that the code makes exercise of.

Memory layout

Because the memory is so tight, there shouldn’t be any longer noteworthy air in that memory structure. Most regularly, it is nearly 100% broken-down.

malloc() could maybe raze you

Demonstrate that there is one memory allocation mechanism that is no longer within the list above: the C language same outdated malloc()/free() mechanism.

Why?

Because with malloc()/free(), we don’t know beforehand how noteworthy memory that could perchance be broken-down.

We are able to also just discontinue up with using higher than we’d demand, at runtime. After which, it is some distance going to be too unhurried. The instrument could maybe already be slow.

Our devices are deployed in refined-to-reach places. They ought to quiet be in any admire times accessible. We are able to’t earn the money for them to flee out of memory when we least demand it.

If fact be told, we plod to mountainous lengths to dwell some distance from any surprises.

Now we bear a thorough test setup, which we flee on every code trade. This includes running the machine in a region of network simulators, so as that we’ll earn a device to take dangle of for particular that the machine works as expected. And for the length of manufacturing, we attain production checking out to construct sure that that the hardware works.

So we’ll earn a device to’t exercise malloc()/free(). And we don’t.

The stack doesn’t play fine

Within the memory structure, the stack is a neat chunk. How will we know the device noteworthy to allocate to the stack?

The stack memory is considerably tricky to allocate, because its maximum dimension is crawl at runtime.

There is no longer such a thing as a silver bullet: we’ll earn a device to’t predict it. So we desire to measure it.

The manner we attain it is some distance simple: at boot, we bear the stack memory with a identified byte sample. We then flee the machine. The machine will exercise the stack. This will seemingly perchance overwrite that byte sample.

After we now bear flee the machine for a whereas, we test how noteworthy of that byte sample is left. That provides us an thought of how noteworthy stack the machine makes exercise of.

Stack usage

We then allocate just a dinky more stack memory than what we predict we desire. To be on the stable aspect.

But we now bear one trick left: we’ll earn a device to preserve measuring that byte sample, even when devices are deployed.

That computer virus

So all that leads up to that enticing computer virus that we stumbled across.

We chanced on that some devices, within the discipline, had been using more stack than we had assumed. And that used to be arresting. Because we had made in truth particular that the stack dimension used to be correct.

This made the instrument behave strangely. Unpredictably.

So what used to be going on?

We took a deeper survey at surely one of many devices, within the lab.

We added more places within the code where we measured the stack usage. To are attempting to hone in on the fragment of the code that precipitated the overflow.

And we chanced on it:

rand();

Huh?

Effective, the same outdated C library characteristic that produces pseudo-random numbers.

This precipitated the stack to overflow.

We exercise rand() in just a few places within the code, where we desire a transient pseudo-random amount that doesn’t ought to quiet be cryptographically stable.

It is miles a straightforward characteristic that ought to no longer exercise noteworthy stack.

So why used to be it overflowing the stack?

The wrongdoer: rand()

The Thingsquare machine exercise the newlib same outdated C library. This is delivery source, so we’ll earn a device to survey on the code.

This is the code of the rand() characteristic, which looks acquainted:

int
rand_r (unsigned int *seed)
{
        long k;
        long s=(long)(*seed);
        if (s==0)
          s=0x12345987;
        k=s / 127773;
        s=16807 (s - k 127773) - 2836 k;
        if (s & RAND_MAX);
}

Why would this code use so much stack that it blew through its bounds?

There is no large arrays or structs allocated on the stack.

There is no recursion.

But it turns out that the problem is not in this code. It is in the code that calls this code.

The newlib library has a reentrancy layer that makes it possible to call functions multiple times, simultaneously.

And this reentrancy code is implemented with C macros. It is difficult to understand from a first glance. This is how the actual rand() function looks:

int
rand (void)
{
  struct _reent *reent=_REENT;

  /This multiplier was obtained from Knuth, D.E., "The Art of
     Computer Programming," Vol 2, Seminumerical Algorithms, Third
     Edition, Addison-Wesley, 1998, p. 106 (line 26) & p. 108 */
  _REENT_CHECK_RAND48(reent);
  _REENT_RAND_NEXT(reent)=
     _REENT_RAND_NEXT(reent) __extension__ 6364136223846793005LL + 1;
  return (int)((_REENT_RAND_NEXT(reent)>> 32) & RAND_MAX);
}

Maybe these calls are the discipline?

And, sure, as it turns out, they are.

Deep inside that _REENT_CHECK_RAND48() macro, we uncover:

/Generic _REENT test macro.  */
#suppose _REENT_CHECK(var, what, kind, dimension, init) attain { 
  struct _reent *_r=(var); 
  if (_r->what==NULL) { 
    _r->what=(kind)malloc(dimension); 
    __reent_assert(_r->what); 
    init; 
  } 
} whereas (0)

Oooops – a malloc()! That one killer characteristic that we wished to dwell some distance from.

But is it in truth broken-down?

Effective, the compiled code, we peek that malloc() being called:

0001ff80 :
   1ff80:       4b16            ldr     r3, [pc, #88]   ; (1ffdc )
   1ff82:       b510            push    {r4, lr}
   1ff84:       681c            ldr     r4, [r3, #0]
   1ff86:       6ba3            ldr     r3, [r4, #56]   ; 0x38
   1ff88:       b9b3            cbnz    r3, 1ffb8 
   1ff8a:       2018            movs    r0, #24
   1ff8c:       f7ff fee4       bl      1fd58 
   1ff90:       4602            mov     r2, r0
   1ff92:       63a0            str     r0, [r4, #56]   ; 0x38
   1ff94:       b920            cbnz    r0, 1ffa0 
   1ff96:       4b12            ldr     r3, [pc, #72]   ; (1ffe0 )
   1ff98:       4812            ldr     r0, [pc, #72]   ; (1ffe4 )
   1ff9a:       214e            movs    r1, #78 ; 0x4e
   1ff9c:       f000 f952       bl      20244 <__assert_func>
   1ffa0:       4911            ldr     r1, [pc, #68]   ; (1ffe8 )
   1ffa2:       4b12            ldr     r3, [pc, #72]   ; (1ffec )
   1ffa4:       e9c0 1300       strd    r1, r3, [r0]
   1ffa8:       4b11            ldr     r3, [pc, #68]   ; (1fff0 )
   1ffaa:       6083            str     r3, [r0, #8]
   1ffac:       230b            movs    r3, #11
   1ffae:       8183            strh    r3, [r0, #12]
   1ffb0:       2100            movs    r1, #0
   1ffb2:       2001            movs    r0, #1
   1ffb4:       e9c2 0104       strd    r0, r1, [r2, #16]
   1ffb8:       6ba4            ldr     r4, [r4, #56]   ; 0x38
   1ffba:       4a0e            ldr     r2, [pc, #56]   ; (1fff4 )
   1ffbc:       6920            ldr     r0, [r4, #16]
   1ffbe:       6963            ldr     r3, [r4, #20]
   1ffc0:       490d            ldr     r1, [pc, #52]   ; (1fff8 )
   1ffc2:       4342            muls    r2, r0
   1ffc4:       fb01 2203       mla     r2, r1, r3, r2
   1ffc8:       fba0 0101       umull   r0, r1, r0, r1
   1ffcc:       1c43            provides    r3, r0, #1
   1ffce:       eb42 0001       adc.w   r0, r2, r1
   1ffd2:       e9c4 3004       strd    r3, r0, [r4, #16]
   1ffd6:       f020 4000       bic.w   r0, r0, #2147483648     ; 0x80000000
   1ffda:       bd10            pop     {r4, personal computer}
   1ffdc:       20000770        .discover   0x20000770
   1ffe0:       00027bfc        .discover   0x00027bfc
   1ffe4:       00027c13        .discover   0x00027c13
   1ffe8:       abcd330e        .discover   0xabcd330e
   1ffec:       e66d1234        .discover   0xe66d1234
   1fff0:       0005deec        .discover   0x0005deec
   1fff4:       5851f42d        .discover   0x5851f42d
   1fff8:       4c957f2d        .discover   0x4c957f2d

To realize reentrancy, the newlib code makes exercise of malloc() to allocate converse for its randomness, so as that it is some distance going to be called a pair of instances.

Ethical what we wished to dwell some distance from.

But why does malloc() end result within the stack being blown?

Because malloc(), in its default implementation, makes exercise of memory between the smartly-behaved allocated byte, and the stack. Most regularly, for neat systems, here’s aesthetic. Because there could be plenty of free memory between the smartly-behaved allocated byte and the stack.

But no longer in our case.

We don’t bear noteworthy free memory. In narrate that call to malloc() will intervene with the stack, straight away.

And happily we had been ready to earn this by keeping a test on the stack.

But why did this happen now? Now we were running this code for years on discontinue without a complications. As it turns out, the motive is that we just no longer too long within the past upgraded the arm-gcc version. And this version has its newlib built with reentrancy abet, which the old variations did now not bear.

The solution?

Fortunately, the answer is easy.

We correct discontinuance using rand().

As a change, we present our have pseudo-random characteristic. To illustrate, the PCG random amount generator.

Additionally, we added one other regression test that explicitcly tests for occurences of the malloc() code in generated binaries.

+0x6c>+0x68>+0x64>+0x60>+0x20>+0x38>+0x5c>
Read More
Portion this on knowasiak.com to talk over with folks on this topicTake a look at in on Knowasiak.com now for parents that’re no longer registered yet.

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te ChingBio: About: