Sorry, Scandalous Number: Debugging a Crash Beneath Wine

On December 3rd, 2021, a friend of mine needed help: their program was crashing on Wine, and they wanted to know how to fix it.(And I’m only getting around to publishing this now, several months later. I’m a bit disorganized…)Normally, the answer to this question is not very complicated, because a lot of stuff works…

Sorry, Scandalous Number: Debugging a Crash Beneath Wine

Here’s one cute plan!

On December 3rd, 2021, a chum of mine wished support: their program used to be crashing on Wine, and they wished to take hang of repair it.

(And I am most productive getting around to publishing this now, so much of months later. I am somewhat disorganized…)

Typically, the acknowledge to this quiz is now not any longer very sophisticated, because of a form of stuff works fair beautiful in Wine equipped the atmosphere is setup precisely, and a few stuff simply would now not work. In general considerations fall trusty into a form of two classes; nonetheless, in this case, there in actuality wasn’t any apparent cause (at least to me) why it must now not work.

The program in quiz is compiled with MSys2’s MinGW-w64 kit, the usage of GCC 10.3. It contains a few different libraries furthermore compiled with the identical toolchain, as DLLs, in conjunction with libpng and zlib, which I am about to turn out to be very mindful of.

Some debugging had already been performed, and they knew which call the application used to be failing in: a call to png_read_info. I ran it beneath Wine the usage of the WINE_DEBUG=+all option and generated a sizable log file of mostly unnecessary records. After somewhat of correlation, I realized the final API call earlier than the smash: an msvcrt._read, returning efficiently… after which we smash.

After somewhat of misdirection, I spotted a necessary part that I had been glancing over for somewhat: the discover entry to violation used to be an enact, no longer a read or write. That arrangement that RIP is landing within the center of a page that is now not any longer executable. Hmmm. Stack corruption, one way or the other?

Something attention-grabbing about Wine is that it’s likely you’ll perhaps perhaps presumably recede it beneath Valgrind, which, with a few flags, does in actual fact work precisely. However upon doing this, I realized nothing in particular attention-grabbing, surely nothing that can counsel stack corruption, so I moved on.

At this level I obvious to discover away rr, a completely different debugger that can perhaps record and replay program execution. Honestly, it be somewhat overkill here, nonetheless it completely does support you analyze crashes, and this appeared fancy a correct excuse to pull it out. There is somewhat of trickiness with the usage of rr on top of Wine, nonetheless it completely works extra or less fair beautiful; it be fair somewhat of a danger to discover the replay working. I never slightly figured out discover debug symbols to design precisely with this Wine-beneath-GDB setup occurring, so I needed to manually stumble on the address condominium to settle out what I was having a gawk at.

After powerful ado, the program crashes into… nowhere. It crashes at 0x2'fe8f'2910. Nothing is mapped here. Hmm.

Utilizing the magic of rr, I’m in a position to replay to some level suddenly earlier than the smash after which step into it. About a hundred stepis later, and I realized the culprit: e8 20 45 7e 96, on the address 0x3'6810'e3eb. AKA, CALL 0x2fe8f2910 . In other phrases: there could be an explicit CALL to nowhere.

At this level, I threw libpng in a disassembler, and realized the instruction at 0x36810e3eb. The instruction?

A screenshot showing a CALL instruction, CALL near ptr crc32 (E8 78 9E 03 00)
A call to… crc32?

Weird and wonderful. That CALL has a fully completely different address. It is a ways e8 78 9e 03 00, no longer e8 20 45 7e 96 which is non-sense and elements backwards to earlier than the final module.

So who’s editing the CALL? Is it the program? Is it libpng? Is it Wine?

A screenshot of Fred Jones from Scooby Doo imminently unmasking a perpetrator.

One thing we conclude study about the .text section is that it be read-most productive. Finally, it’s good to always mute at least compare this for your disassembler, nonetheless I did, and indeed, it be read-most productive. That arrangement that in reveal to change the section, somebody would must intentionally mark it writable. On UNIX-fancy platforms, you can exercise a syscall fancy mprotect, whereas Home windows provides VirtualProtect in kernel32. Thankfully, there is in actuality no methodology that libpng would hyperlink to VirtualPro

A screenshot of the IDA imports panel, showing VirtualProtect and VirtualQuery being imported from KERNEL32 by the libpng DLL.

…what precisely is libpng doing calling this?

A call graph showing VirtualProtect being called by sub_36812E420, which is called by sub_36812E590, which is called by sub_3680F1200 (which is, effectively, the DLL's entrypoint.)

Curiously, it reaches encourage to sub_3680F1200, which is fair the entry level of the DLL–there is a stub over on the “true” entry level, nonetheless IDA does no longer depend the jmp within the resolution graph, so it’s likely you’ll perhaps perhaps presumably no longer look it here.

In reveal to strive and establish what this code used to be, I extinct the tried and true technique of procuring for attention-grabbing strings, and swiftly realized a few, nonetheless the most attention-grabbing used to be this one: "Unknown pseudo relocation bit dimension %d" – hrm, what’s a pseudo relocation?

I’ve mostly glossed over quite loads of the decrease level necessary elements in this put up, nonetheless I direct this one deserves some extra consideration.

What’s a long-established relocation?

Earlier than answering what a pseudo relocation is, I’d fancy to discuss about typical relocations. When a linker links a program module, it has to buy some arbitrary “negative address” to make exercise of for build apart-dependent code and records. What does that indicate? As an instance you hang a world, statically-initialized variable that would furthermore very properly be a pointer to any other world variable. Here’s allowed. The pointer written into the executable file all the arrangement thru compilation (particularly linking) is the address that would be fair correct if the program module used to be loaded into its most current negative address. Remarkable code and records is build apart-fair, and thus does no longer need relocations, nonetheless any space where an absolute offset into the address condominium needs to be written, such as static pointers, relocations will be wished.

However, being loaded at your most current negative address is a minute bit rare on the moment. For one thing, nearly all executable loaders hang to enhance relocating the module to a completely different negative address, because of otherwise, it’d be very no longer seemingly to simultaneously load two modules whose most current negative addresses result in an overlap, and these cannot be coordinated earlier than time in most circumstances. To boot, in kind AMD64 machines hang hundreds of address condominium, so for security causes, a mitigation called ASLR is sort of repeatedly extinct, which certainly fair randomizes the negative address of program modules despite the incontrovertible truth that they’re no longer before all the pieces overlapping. (Here’s a bit of an oversimplification.)

If we switch (that is, switch the location of) the program module in memory, the addresses that the linker needed to write essentially based mostly off of the most properly liked negative address invent no longer line up, because the module is now at a completely different address, and all offsets are in actual fact shifted by some label. In reveal to regulate this pointer, the linker retail outlets a relocation entry within the binary all the arrangement thru compilation for every occasion of build apart-dependent code or records, such as our pointer. At runtime, the executable loader or runtime linker will read every relocation entry and regulate it in step with the form of relocation and the offset; adding the offset between the most properly liked negative and the staunch negative suddenly to the price tell on the address. As prolonged as there’ll not be any inadvertent build apart-dependent code no longer accounted for by relocations, all the pieces will work completely beautiful.

In kind Home windows uses the Portable Executable format. The PE format contains ~9 or so completely different forms of relocation entries, some of which fluctuate reckoning on CPU architecture. The principal cause here’s compulsory is to handle completely different relocations that regulate CPU instructions, where the address could furthermore very properly be encoded into the CPU instruction in completely different recommendations that the relocation needs to be responsive to. PE handles, as an instance, special circumstances for the MIPS, ARM (32-bit), and RISC-V instruction objects. (UEFI uses the PE binary format for its binaries, so in reveal for RISC-V to be ready to enhance UEFI, the PE binary format wished to add enhance for RISC-V, too. Fun truth!)

Microsoft Home windows vs. Everything Else

The article is, though, in quite loads of different working systems, in particular UNIX-likes, relocations and symbols are completely different. Whereas Home windows binaries hang explicit Imports and Exports, and a vector of pointers called the IAT (Import Address Desk,) ELF binaries hang a single unified image table. And in phrases of relocations, ELF has powerful extra forms of relocations than PE.

Why does this matter? The acknowledge has all the pieces to conclude with linkage.

Ought to you assemble some C code, and it references a image which is now not any longer outlined in that translation unit, it’s a ways treated as an external image. Later, all the arrangement thru linking, when the linker resolves that image, it could truly perhaps perhaps space the address of the logo where wished, and thus, the logo is resolved.

This becomes a dwelling when linking to other libraries and modules; the address of the logo is now not any longer in actual fact identified until runtime, when these libraries and modules are loaded in. Attributable to that, you will have to generate completely different code; code that resolves the address at runtime, then uses that address. Not less than on Home windows, it would no longer be long-established to generate code fancy this for any external image; it’d be slack and wasteful.

Thankfully, there could be a workaround for goal calls: the linker can generate a thunk; a miniature routine that forwards the resolution thru the IAT, then that thunk will most certainly be extinct because the address to write in for the CALL instruction.

However what if you happen to reference a records image from any other library or module? Or, if you happen to strive and discover the address of a goal image from any other library or module? That is a dwelling. That you just must always generate the aforementioned code which resolves the address first, and the compiler, no longer vivid that here’s the case, will generate the lumber code, and the linker is now not any longer going to be ready to handle it.

With ELF, you positively can conclude this, the usage of image-relative relocations. With PE, you are S.O.L. (Merely Out of Honest correct fortune.) Or at least, you can in general be.

What a pseudo-relocation is.

Finally, it is that it’s likely you’ll perhaps perhaps presumably imagine to hyperlink to records symbols on Home windows. Here’s what all of that __declspec(dllimport) industry is for: it’s likely you’ll perhaps perhaps presumably specify it on declarations of external symbols, and that methodology the compiler can generate code which is appropriate for linking to an external image in any other library.

So what’s the dwelling?

Properly, a form of code written for UNIX-likes would now not mark their symbols with __declspec(dllimport), provided that it’s a Home windows-most productive feature. MinGW needs to enhance compiling these programs, and in reveal to enhance that, it has invented the notion of pseudo-relocations. (And even Cygwin has invented this notion; I am no longer certain.) A pseudo-relocation is a “fake” relocation handled at runtime, by the library itself, after the staunch relocations are performed. It does this by, on the entrypoint, strolling thru a checklist of pseudo-relocations and adjusting the pointers with some offset relative to an IAT entry. These pseudo-relocations import image-relative imports, fair fancy ELF systems.

In other phrases… Pseudo-relocations are a MinGW feature that implements a completely different roughly “relocation” where a pointer in code or records is modified with an address relative to an imported image.

(Truth be urged, it be unclear why MinGW has made up our minds that a call to crc32 needs a pseudo-relocation, nonetheless it completely looks fancy it could truly perhaps perhaps happen as soon as you lumber flags to stop thunks from being generated, and the lumber definitions to zlib to stop it from the usage of the upright attributes on its symbols.)

Hopefully, you hang at least as correct an knowing as I conclude about why pseudo-relocations exist, and what dwelling they’re supposed to resolve… nonetheless a dwelling stays:

Why does this code, which works beneath Home windows, break beneath Wine? Ought to you are in particular fervent, it’s likely you’ll perhaps furthermore hang already acquired an notion what’s occurring, nonetheless fair to make certain we will dive into precisely what’s occurring.

In reveal to discover extra insight into what’s occurring, we are in a position to debug the pseudo-relocation implementation. I invent no longer hang debug symbols for it, nonetheless that is now not any longer a gigantic deal, for the reason that pseudo-relocation code compiles correct down to slightly succinct and comprehensible machine code.

Wine provides a GDB server, so it’s likely you’ll perhaps perhaps presumably connect a form of completely different debuggers. However, I hit a necessary limitation with winedbg fair correct away: it looks to enact the loader earlier than now we hang a gamble to insert breakpoints, which arrangement the libpng entrypoint has already ran earlier than we discover the likelihood to break on it. This leaves us with a few completely different choices:

  • We could furthermore fudge the was_init variable. Or no longer it’s a world that the pseudo-reloc code uses to salvage out if it has recede already; the DLL entrypoint runs for original threads, so we could furthermore fair break on that, then regulate was_init in tell that it runs as soon as more anyways. This doubtlessly works in this case, nonetheless it completely’s no longer very versatile.
  • We can forgo winedbg and easily recede Wine itself beneath GDB.

Wine beneath GDB

When I extinct rr earlier, I was principally already doing this. However, rr is nice overkill for this dwelling, and in actual fact introduces some complexity of its hang, so it be doubtlessly more straightforward to fair forgo it for now.

I am on NixOS, where the WINE binary on the $PATH is that in actual fact a shell script. We’ll repeat GDB to enact bash first.

 gdb --args bash wine Recreation.exe

That is now not any longer going to work fair but; now we want to regulate the behaviors on fork and exec; then we are in a position to head.

(gdb) dwelling apply-fork-mode minute one
(gdb) dwelling apply-exec-mode original
(gdb) desire fork
Catchpoint 1 (fork)
(gdb) desire exec
Catchpoint 2 (exec)
(gdb) recede

We can continue a few cases until we’re at final in our target binary, then flip apply-fork-mode encourage to father or mother and apply-exec-mode encourage to identical. I want to dwelling a breakpoint on the level at which the was_init variable is flagged. Because WINE would now not put into effect ASLR, our libraries cease up at their most current negative addresses, so lacking symbols in GDB, I’m in a position to fair enter the raw addresses as obvious by digging around in IDA:

(gdb) break *0x36812E5C8
Breakpoint 3 at 0x36812e5c8

GDB would now not give us a complete bunch of feedback. Something priceless it’s likely you’ll perhaps perhaps presumably conclude is hang GDB print the disassembly on the instruction pointer for you:

(gdb) hide/i $personal computer
1: x/i $personal computer
=> 0x36812e5c8: movl   $0x1,0x14b0e(%rip)        # 0x3681430e0

Now, as we step thru, we are in a position to seem what instruction we are on.

In this case, the +0x14b0e address is was_init. This instruction could furthermore notion odd since pseudo-reloc has ++was_init in preference to was_init=1, nonetheless I direct we are in a position to buy that the compiler has optimized it to buy was_init is zero due to the conditional beforehand. Tidy.

This could furthermore discover good unimaginative if we tried to esteem and hide every instruction. I’ve already analyzed the goal and realized the relocations, so I needs to be ready to dwelling a conditional breakpoint that gets me into the staunch space I want to be.

IDA Pro screenshot showing a number of runtime_pseudo_reloc_item_v2 structures, highlighting the one that covers the offset of interest for us.

IDA Pro annoyingly presentations the identical address for all relocations because of it be marked as an array. That is OK – we fair must calculate it out, and make a instant breakpoint:

(gdb) break *0x36812E69D if $rbx==0x36813DF4C
Breakpoint 5 at 0x36812e69d

And now we are in a position to continue. After we’re there, we are in a position to step around:

(gdb) stepi
0x000000036812e69f in ?? ()
1: x/i $personal computer
=> 0x36812e69f: mov    0x4(%rbx),%esi
0x000000036812e6a2 in ?? ()
1: x/i $personal computer
=> 0x36812e6a2: movzbl 0x8(%rbx),%edx
0x000000036812e6a6 in ?? ()
1: x/i $personal computer
=> 0x36812e6a6: add    %r13,%rax
0x000000036812e6a9 in ?? ()
1: x/i $personal computer
=> 0x36812e6a9: add    %r13,%rsi
0x000000036812e6ac in ?? ()
1: x/i $personal computer
=> 0x36812e6ac: mov    (%rax),%r15
0x000000036812e6af in ?? ()
1: x/i $personal computer
=> 0x36812e6af: cmp    $0x20,%edx
0x000000036812e6b2 in ?? ()
1: x/i $personal computer
=> 0x36812e6b2: je     0x36812e7a8

Here’s fair a swap statement compiled trusty into a conditional tree. As fair correct fortune would hang it, all of our relocations are 32-bit (), so this first conditional hits suddenly.

0x000000036812e7a8 in ?? ()
1: x/i $personal computer
=> 0x36812e7a8: mov    (%rsi),%edx
0x000000036812e7aa in ?? ()
1: x/i $personal computer
=> 0x36812e7aa: mov    %rdx,%rcx
(gdb) nexti
0x000000036812e7ad in ?? ()
1: x/i $personal computer
=> 0x36812e7ad: or     %r14,%rdx
0x000000036812e7b0 in ?? ()
1: x/i $personal computer
=> 0x36812e7b0: test   %ecx,%ecx
0x000000036812e7b2 in ?? ()
1: x/i $personal computer
=> 0x36812e7b2: cmovns %rcx,%rdx
0x000000036812e7b6 in ?? ()
1: x/i $personal computer
=> 0x36812e7b6: mov    %rsi,%rcx
0x000000036812e7b9 in ?? ()
1: x/i $personal computer
=> 0x36812e7b9: sub    %rax,%rdx
0x000000036812e7bc in ?? ()
1: x/i $personal computer
=> 0x36812e7bc: add    %rdx,%r15
0x000000036812e7bf in ?? ()
1: x/i $personal computer
=> 0x36812e7bf: call   0x36812e420
0x000000036812e7c4 in ?? ()
1: x/i $personal computer
=> 0x36812e7c4: mov    %r15d,(%rsi)
0x000000036812e7c7 in ?? ()
1: x/i $personal computer
=> 0x36812e7c7: jmp    0x36812e694

This code fair correct here is the culprit. It fair wrote %r15d to the memory on the address pointed to by %rsi. What are these values?

(gdb) i r r15d
r15d           0x967e4520          -1770109664
(gdb) i r rsi
rsi            0x36810e3ec         14630839276

We hang, indubitably, located the culprit. Or no longer it’s writing the staunch identical sequence of unsuitable bytes we seen earlier. What the heck is lumber along with it? Why is now not any longer in actuality it getting the qualified address for crc32? Why is it 0x1'0000'0000 too a ways forward?

Ought to you do now not hang any longer figured it out but, this must mute conclude it:

# At this level, %rax elements to where the IAT entry is,
# and %r15 elements to the staunch label in it.
rax            0x368148268         14631076456
r15            0x1fe8f2910         8565762320

# Learn the price on the target into %edx.
# Here's a pointer into the IAT.
mov    (%rsi),%edx

rsi            0x36810e3ec         14630839276
*rsi           0x39e78             237176
edx            0x20 -> 0x39e78     32 -> 237176

# Copy %rdx into %rcx.
mov    %rdx,%rcx
rdx            0x39e78             237176
rcx            0x0 -> 0x39e78      0 -> 237176

# Create signal extension on %rdx reproduction.
or     %r14,%rdx
r14            0xffffffff00000000  -4294967296
rdx            0x39e78 -> 0xffffffff00039e78  237176 -> -4294730120

# Check %ecx for flags.
test   %ecx,%ecx
eflags         0x286               [ PF SF IF ]
ecx            0x39e78             237176

# This undoes the signal extension if the signal bit is now not any longer dwelling.
cmovns %rcx,%rdx
eflags         0x206               [ PF IF ]
rcx            0x39e78             237176
rdx            0xffffffff00039e78 -> 0x39e78  -4294730120 -> 237176

# At this level now we hang undone the signal extension.

# Transfer relative offset of IAT into rcx, for mark_section_writable.
mov    %rsi,%rcx
rsi            0x36810e3ec         14630839276
rcx            0x39e78 -> 0x36810e3ec  237176 -> 14630839276

# %rax is completely the address of the IAT entry. Subtract it from %rdx.
sub    %rax,%rdx
rax            0x368148268         14631076456
rdx            0x39e78 -> 0xfffffffc97ef1c10  237176 -> -14630839280

# Add %rdx to %r15.
add    %rdx,%r15
rdx            0xfffffffc97ef1c10  -14630839280
r15            0x1fe8f2910 -> 0xfffffffe967e4520  8565762320 -> -6065076960

# Call mark_section_writable
call   0x36812e420

# Write the pseudo-relocation encourage.
mov    %r15d,(%rsi)
rsi            0x36810e3ec         14630839276
*rsi           0x39e78 -> 0x967e4520  237176 -> -1770109664
r15d           0x967e4520          -1770109664

Did you desire it? The distance between the instruction is greater than what’s going to most certainly be saved in a 32-bit label. The E8 CALL instruction can most productive jump between [-231,231) bytes faraway from the RIP as of execution because of it could truly perhaps perhaps most productive retailer a 32-bit signed offset. Sadly, the pseudo reloc code simply failed silently encourage after I was debugging this, nonetheless I direct it has been fastened and now outputs an error when this happens, so it must now not be so puzzling to future generations.

One final thing…

There is one extra uncommon thing though. This program works on Home windows, reliably. Obviously, it’s no longer in actuality loading libraries at their most current negative addresses, or it would smash. So why is this occurring?

Properly, easy: Wine would now not enhance ASLR, and the libraries, at their most current addresses, pause up too a ways away for the pseudo-relocations.

However, the incontrovertible truth that it works seemingly reliably on Home windows is terribly attention-grabbing. Per chance an bright exploration would be to notion precisely why Home windows ASLR looks to consistently attach shut addresses which will most certainly be unproblematic. Per chance it be since the principle time after bootup that these advise modules load is in fleet succession?

Regardless, now vivid how the dwelling will most certainly be fastened, it be exhausting to be motivated to dig too powerful deeper. It is a ways going to furthermore very properly be effective if Wine could furthermore hang an analogous ASLR habits to Home windows, in tell that these considerations are less seemingly to chop up most productive on Wine, nonetheless these considerations could furthermore furthermore occur on Home windows with ASLR disabled, so it be doubtlessly no longer that necessary.

Total, I had a fair correct time debugging this dwelling. I am furthermore in actuality happy with how a ways Wine has near, and I conclude no longer direct it’s a accident that the dwelling we hit used to be no longer reasonably Wine’s fault. I am, nonetheless, somewhat sad that I didn’t discover an alternative to observe down and repair a defective Wine malicious program, nonetheless the final extra happy that the cause for here’s because of it simply didn’t exist.

Per chance subsequent time. 🙂

Read More
Share this on to check with with of us on this topicRegister on now if you happen to could properly be no longer registered but.



“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te ChingBio: About: