These addons are fine!!
We just no longer too long within the past came across a irregular computer virus deep within the most up-to-date version of the Thingsquare IoT platform that took us by shock.
The root motive turned out to be that the rand()
characteristic called the malloc()
characteristic. We had been bowled over by this.
And now we’re trying to spread the discover, so as that everyone is conscious of: rand()
could just – below sure stipulations – call malloc()
.
What does this mean to the usual person?
To most folk, no longer noteworthy.
If fact be told, no longer even most system builders will be plagued by this.
But it no doubt affected us.
What we’re talking about here is terribly low-stage. Very bare-metallic. Formulation more bare-metallic than most system builders will ever be.
But let’s dive into it.
The Thingsquare IoT platform runs on devices that bear ridiculously little amounts of memory.
Memory allocation is a discipline
The Thingsquare IoT platform runs on devices that bear little amounts of memory.
Very little.
Each and every infrequently as little as 10 kilobytes. And typically we’ll earn a device to splurge with as noteworthy as 32 kilobytes.
For most fresh applications, here’s ridiculously little.
We therefore ought to quiet be very artful in how we prepare this memory.
Within the Thingsquare machine, we now bear three totally different forms of memory allocation mechanisms:
- memb: the memory block allocator. Mounted-dimension blocks from static pools of blocks.
- mmem: the managed memory allocator. Dynamic-sized blocks, that would be rearranged to dwell some distance from fragmentation, from a static chunk of memory.
- bmem: the block memory heap allocator. Dynamic-sized blocks from a static chunk of memory.
And, since the machine is written within the C programming language, we also exercise stack memory.
The total above are in line with one thought:
Continuously pre-allocate all the things at assemble-time.
This means that we know, beforehand, how noteworthy memory is being broken-down. If we exercise too noteworthy, we won’t even be ready to assemble the code. Powerful less flee it.
What we discontinue up with is a memory structure that looks devour this:
The stack starts on the discontinue of the memory and grows downwards. But we ought to quiet be cautious – more about this below.
The stack is followed by a bunch of memb, mmem, and bmem blocks. As successfully as totally different data that the code makes exercise of.
Because the memory is so tight, there shouldn’t be any longer noteworthy air in that memory structure. Most regularly, it is nearly 100% broken-down.
malloc() could maybe raze you
Demonstrate that there is one memory allocation mechanism that is no longer within the list above: the C language same outdated malloc()
/free()
mechanism.
Why?
Because with malloc()
/free()
, we don’t know beforehand how noteworthy memory that could perchance be broken-down.
We are able to also just discontinue up with using higher than we’d demand, at runtime. After which, it is some distance going to be too unhurried. The instrument could maybe already be slow.
Our devices are deployed in refined-to-reach places. They ought to quiet be in any admire times accessible. We are able to’t earn the money for them to flee out of memory when we least demand it.
If fact be told, we plod to mountainous lengths to dwell some distance from any surprises.
Now we bear a thorough test setup, which we flee on every code trade. This includes running the machine in a region of network simulators, so as that we’ll earn a device to take dangle of for particular that the machine works as expected. And for the length of manufacturing, we attain production checking out to construct sure that that the hardware works.
So we’ll earn a device to’t exercise malloc()/free()
. And we don’t.
The stack doesn’t play fine
Within the memory structure, the stack is a neat chunk. How will we know the device noteworthy to allocate to the stack?
The stack memory is considerably tricky to allocate, because its maximum dimension is crawl at runtime.
There is no longer such a thing as a silver bullet: we’ll earn a device to’t predict it. So we desire to measure it.
The manner we attain it is some distance simple: at boot, we bear the stack memory with a identified byte sample. We then flee the machine. The machine will exercise the stack. This will seemingly perchance overwrite that byte sample.
After we now bear flee the machine for a whereas, we test how noteworthy of that byte sample is left. That provides us an thought of how noteworthy stack the machine makes exercise of.
We then allocate just a dinky more stack memory than what we predict we desire. To be on the stable aspect.
But we now bear one trick left: we’ll earn a device to preserve measuring that byte sample, even when devices are deployed.
That computer virus
So all that leads up to that enticing computer virus that we stumbled across.
We chanced on that some devices, within the discipline, had been using more stack than we had assumed. And that used to be arresting. Because we had made in truth particular that the stack dimension used to be correct.
This made the instrument behave strangely. Unpredictably.
So what used to be going on?
We took a deeper survey at surely one of many devices, within the lab.
We added more places within the code where we measured the stack usage. To are attempting to hone in on the fragment of the code that precipitated the overflow.
And we chanced on it:
rand();
Huh?
Effective, the same outdated C library characteristic that produces pseudo-random numbers.
This precipitated the stack to overflow.
We exercise rand()
in just a few places within the code, where we desire a transient pseudo-random amount that doesn’t ought to quiet be cryptographically stable.
It is miles a straightforward characteristic that ought to no longer exercise noteworthy stack.
So why used to be it overflowing the stack?
The wrongdoer: rand()
The Thingsquare machine exercise the newlib same outdated C library. This is delivery source, so we’ll earn a device to survey on the code.
This is the code of the rand()
characteristic, which looks acquainted:
int
rand_r (unsigned int *seed)
{
long k;
long s=(long)(*seed);
if (s==0)
s=0x12345987;
k=s / 127773;
s=16807 (s - k 127773) - 2836 k;
if (s & RAND_MAX);
}
Why would this code use so much stack that it blew through its bounds?
There is no large arrays or structs allocated on the stack.
There is no recursion.
But it turns out that the problem is not in this code. It is in the code that calls this code.
The newlib library has a reentrancy layer that makes it possible to call functions multiple times, simultaneously.
And this reentrancy code is implemented with C macros. It is difficult to understand from a first glance. This is how the actual rand()
function looks:
int
rand (void)
{
struct _reent *reent=_REENT;
/This multiplier was obtained from Knuth, D.E., "The Art of
Computer Programming," Vol 2, Seminumerical Algorithms, Third
Edition, Addison-Wesley, 1998, p. 106 (line 26) & p. 108 */
_REENT_CHECK_RAND48(reent);
_REENT_RAND_NEXT(reent)=
_REENT_RAND_NEXT(reent) __extension__ 6364136223846793005LL + 1;
return (int)((_REENT_RAND_NEXT(reent)>> 32) & RAND_MAX);
}
Maybe these calls are the discipline?
And, sure, as it turns out, they are.
Deep inside that _REENT_CHECK_RAND48()
macro, we uncover:
/Generic _REENT test macro. */
#suppose _REENT_CHECK(var, what, kind, dimension, init) attain {
struct _reent *_r=(var);
if (_r->what==NULL) {
_r->what=(kind)malloc(dimension);
__reent_assert(_r->what);
init;
}
} whereas (0)
Oooops – a malloc()
! That one killer characteristic that we wished to dwell some distance from.
But is it in truth broken-down?
Effective, the compiled code, we peek that malloc()
being called:
0001ff80 :
1ff80: 4b16 ldr r3, [pc, #88] ; (1ffdc )
1ff82: b510 push {r4, lr}
1ff84: 681c ldr r4, [r3, #0]
1ff86: 6ba3 ldr r3, [r4, #56] ; 0x38
1ff88: b9b3 cbnz r3, 1ffb8
1ff8a: 2018 movs r0, #24
1ff8c: f7ff fee4 bl 1fd58
1ff90: 4602 mov r2, r0
1ff92: 63a0 str r0, [r4, #56] ; 0x38
1ff94: b920 cbnz r0, 1ffa0
1ff96: 4b12 ldr r3, [pc, #72] ; (1ffe0 )
1ff98: 4812 ldr r0, [pc, #72] ; (1ffe4 )
1ff9a: 214e movs r1, #78 ; 0x4e
1ff9c: f000 f952 bl 20244 <__assert_func>
1ffa0: 4911 ldr r1, [pc, #68] ; (1ffe8 )
1ffa2: 4b12 ldr r3, [pc, #72] ; (1ffec )
1ffa4: e9c0 1300 strd r1, r3, [r0]
1ffa8: 4b11 ldr r3, [pc, #68] ; (1fff0 )
1ffaa: 6083 str r3, [r0, #8]
1ffac: 230b movs r3, #11
1ffae: 8183 strh r3, [r0, #12]
1ffb0: 2100 movs r1, #0
1ffb2: 2001 movs r0, #1
1ffb4: e9c2 0104 strd r0, r1, [r2, #16]
1ffb8: 6ba4 ldr r4, [r4, #56] ; 0x38
1ffba: 4a0e ldr r2, [pc, #56] ; (1fff4 )
1ffbc: 6920 ldr r0, [r4, #16]
1ffbe: 6963 ldr r3, [r4, #20]
1ffc0: 490d ldr r1, [pc, #52] ; (1fff8 )
1ffc2: 4342 muls r2, r0
1ffc4: fb01 2203 mla r2, r1, r3, r2
1ffc8: fba0 0101 umull r0, r1, r0, r1
1ffcc: 1c43 provides r3, r0, #1
1ffce: eb42 0001 adc.w r0, r2, r1
1ffd2: e9c4 3004 strd r3, r0, [r4, #16]
1ffd6: f020 4000 bic.w r0, r0, #2147483648 ; 0x80000000
1ffda: bd10 pop {r4, personal computer}
1ffdc: 20000770 .discover 0x20000770
1ffe0: 00027bfc .discover 0x00027bfc
1ffe4: 00027c13 .discover 0x00027c13
1ffe8: abcd330e .discover 0xabcd330e
1ffec: e66d1234 .discover 0xe66d1234
1fff0: 0005deec .discover 0x0005deec
1fff4: 5851f42d .discover 0x5851f42d
1fff8: 4c957f2d .discover 0x4c957f2d
To realize reentrancy, the newlib code makes exercise of malloc()
to allocate converse for its randomness, so as that it is some distance going to be called a pair of instances.
Ethical what we wished to dwell some distance from.
But why does malloc()
end result within the stack being blown?
Because malloc()
, in its default implementation, makes exercise of memory between the smartly-behaved allocated byte, and the stack. Most regularly, for neat systems, here’s aesthetic. Because there could be plenty of free memory between the smartly-behaved allocated byte and the stack.
But no longer in our case.
We don’t bear noteworthy free memory. In narrate that call to malloc()
will intervene with the stack, straight away.
And happily we had been ready to earn this by keeping a test on the stack.
But why did this happen now? Now we were running this code for years on discontinue without a complications. As it turns out, the motive is that we just no longer too long within the past upgraded the arm-gcc version. And this version has its newlib built with reentrancy abet, which the old variations did now not bear.
The solution?
Fortunately, the answer is easy.
We correct discontinuance using rand()
.
As a change, we present our have pseudo-random characteristic. To illustrate, the PCG random amount generator.
Additionally, we added one other regression test that explicitcly tests for occurences of the malloc()
code in generated binaries.
+0x6c>+0x68>+0x64>+0x60>+0x20>+0x38>+0x5c>
Read More
Portion this on knowasiak.com to talk over with folks on this topicTake a look at in on Knowasiak.com now for parents that’re no longer registered yet.