Lingua Diabolis

Apr 02, 2025

Debugging loadlibrary Through Space and Time

In 2017 Tavis Ormandy released loadlibrary, a "library that allows native Linux programs to load and call functions from a Windows DLL". As a showcase, the code included mpclient, a program that was capable of loading mpengine.dll of Windows Defender and scan files for malware on Linux. This is an impressive feat: mpengine.dll is a notoriously complex, ~20MB library that I tend to use to stress-test static analysis tools - getting it to actually execute on a different operating system is really something!

Unfortunately mpengine.dll gets significant updates almost every month: with 5% of functions changing or unmatched in a library this size, we are talking thousands of changes monthly. It's no surprise mpclient has old unresolved issues about crashes with no easy fixes. While I know about private projects that successfully utilized loadlibrary for researching Defender and even other AV engines, the library was mostly unmaintained for years.

A couple of weeks ago however WaffleSec reported success with a recent mpengine.dll version after they fixed some mock API's, sparking my interest again in the project. cube0x8, the author of a 64-bit fork (the original loadlibrary is 32-bit only) also entered the discussion and now there's a new PR promising support for mpengine.dll again.

During the past days I worked on merging WaffleSec's changes to the 64-bit branch and encountered a particularly nasty bug that provided a great use-case for demonstrating the usefulness of loadlibrary: using Linux-based debugging tools to inspect code for Windows. While similar tools now are available for Windows too, I think the demonstration of modern debugging techniques by itself is also educational, and last but not least I secretly hope this post will bring in some more muscle for maintaining loadlibrary :)

Overview

From the loadlibrary README:

The peloader directory contains a custom PE/COFF loader derived from ndiswrapper. The library will process the relocations and imports, then provide a dlopen-like API.

With the program in memory the main problem to solve is providing it the interfaces it expect from Windows. In loadlibrary this is solved by mocking the Windows API: the peloader/winapi directory of the project contains minimal implementations for methods exposed by standard DLL's like USER32, and these methods are used to populate the import table of the loaded Portable Executable. Since mpengine.dll is "self-contained" (in fact, its large size is mostly the result of static linking) we can get away with e.g. not implementing different behaviors for different flags, or simply doing nothing in case of more complex requests, like ones for threading.

Since we don't expect our code to run in untrusted environments, memory is not randomized that helps analysis and debugging. In case of my tests:

  • The .text section of mpengine.dll started at 0x75a101000
  • mpclient's code was mapped at 0x55bf2bb99000

The Merge

So I merged cube0x8's and WaffleSec's branches, resolved conflicts and got a SEGFAULT on the first run:

Program received signal SIGSEGV, Segmentation fault.
0x00000000ffffde70 in ?? ()                         
(gdb)

The instruction pointer is clearly off in the wilderness, but fortunately we have a partial backtrace:

(gdb) bt                       
#0  0x00000000ffffde70 in ?? ()
#1  0x000000075a11e16b in ?? ()
#2  0x000000075ae21a88 in ?? ()
#3  0x00007fffffffe1f0 in ?? ()
#4  0x0000000000000000 in ?? ()

At #1 we see an address inside mpengine.dll, so let's look at it in Ghidra (here's how to use Ghidra to debug loadlibrary with symbols for mpengine.dll):

75a11e165 CALL       qword ptr [->ADVAPI32.DLL::InitializeSecurityDescriptor]
75a11e16b TEST       EAX,EAX

A quick grep shows that InitializeSecurityDescriptor is not present in our mock API yet, so let's create it:

STATIC BOOL WINAPI InitializeSecurityDescriptor(                    
  PVOID pSecurityDescriptor,                                        
  DWORD dwRevision
){                                                                  
    DebugLog("Returning success from InitializeSecurityDescriptor");
    return 1;                                                       
};                                                                  

DECLARE_CRT_EXPORT("InitializeSecurityDescriptor", InitializeSecurityDescriptor);

I just expose an empty function, as there is no one to check the security descriptor anyway. By iterating this process I ended up mocking some more API's. Some of my first attempts quickly killed mpclient as I forgot to include the WINAPI macro in the declaration: this results in mpengine.dll calling the import with a different calling convention (RCX, RDX, R8 R9, stack) than expected by my implementation (RDI, RSI, RDX, RCX, R8, R9, stack) causing quick and merciless segfaults.

Interestingly, it seems the newly implemented API's were not needed by WaffleSec when testing the 32-bit DLL. I spent quite some time trying to figure out why the two binaries behave differently, and found that the "TDT" component of mpengine.dll does some pretty detailed platform detection which can explain divergent code paths on different architectures, but I didn't identify the point of divergence: my mock API's worked well enough and I encountered a much more worrying bug.

The Bug

This is how our little bug looked like:

Receiived signal SIGSEGV, Segmentation fault.
0x000000075a12bb96 in ?? ()
(rr) x/8i $rip
=> 0x75a12bb96: mov    rcx,QWORD PTR [rax]
   0x75a12bb99: call   0x75a12b5a8
   0x75a12bb9e: mov    DWORD PTR [rbx+0x50],eax
   0x75a12bba1: mov    rax,QWORD PTR [rbx+0x8]
   0x75a12bba5: lea    r8,[rip+0xd0fd06]        # 0x75ae3b8b2
   0x75a12bbac: lea    rdx,[rip+0xd0fcfe]        # 0x75ae3b8b1
   0x75a12bbb3: mov    rcx,QWORD PTR [rax]
   0x75a12bbb6: call   0x75a12b5a8
(rr) i r
rax            0x69006e0075002f    29555345008689199
rbx            0x7fffdc0c5bc0      140736885185472
rcx            0x55bf6ac0c640      94280618133056
rdx            0x75ae3b8a9         31589644457
rsi            0x0                 0

Uh-oh, it seems a wide-char string overwrote a pointer, a bad case of memory corruption! Have I miscalculated some bounds somewhere? My first hunch was to see what the corrupting string was so I may be able to pinpoint the source of the corruption. This wasn't really useful, because:

  • The string turned out to be the URL of one of Defender's many telemetry services
  • The string was cut in half by a NULL byte, indicating that I may be looking at a late state of the original corruption.

Looking at Ghidra's disassembly I also realized I'm neck-deep in the regex engine of the boost library, with no suspicious Windows API calls in sight (only 1-2 functions are visible in any given stack trace for some reason I haven't looked into). This seemed like a lot of trouble and I spent at least a full a day investigating different dead ends I can't recall anymore. Then I remembered: the purpose of loadlibrary is to enable the use of analysis tools, so why not start use one (or two) already?!

Since I suspected heap corruption, my first tool of choice was AddressSanitizer (ASAN): our mock Windows API invokes plain old malloc() in place of HeapAlloc&co. so we can instrument and monitor memory allocations even inside mpengine.dll!

To my surprise, ASAN didn't catch anything: the crash occurred at exactly the same place without any prior indication of heap corruption. But! The memory layout turned quite different:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff335eb96 in ?? ()
(gdb) i r
rax            0xbebebebebebebebe  -4702111234474983746
rbx            0x7fffffffcc10      140737488342032
rcx            0x61400002aa40      106927505975872
rdx            0x7ffff406e8a9      140737287481513
rsi            0x0                 0
...

(gdb) x/16x $rcx
0x61400002aa40: 0x00000000      0x00000000      0x00000000      0x00000000
0x61400002aa50: 0x00000000      0x00000000      0xbebebebe      0xbebebebe
0x61400002aa60: 0xbebebebe      0xbebebebe      0xbebebebe      0x00000000
0x61400002aa70: 0xbebebebe      0xbebebebe      0xbebebebe      0xbebebebe

I'm showing RCX because it points to the object where the values are copied from. Preceding code looks like this:

undefined __thiscall basic_regex_creator<> * __thiscall
                     boost::re_detail_500::basic_regex_creator<>::basic_regex_creator<>
                     (basic_regex_creator<> *this,regex_data<> *param_1)
                  assume GS_OFFSET = 0xff00000000
undefined         <UNASSIGNED>   <RETURN>
basic_regex_cr    RCX:8 (auto)   this
regex_data<> *    RDX:8          param_1
undefined8        Stack[0x8]:8   local_res8                              XREF[1]:     75a12bb28(W)  

75a12bb28 48 89 4c        MOV        qword ptr [RSP + local_res8],this
          24 08
75a12bb2d 53              PUSH       RBX
75a12bb2e 48 83 ec 20     SUB        RSP,0x20
75a12bb32 48 8b d9        MOV        RBX,this                        ; RBX := RCX (basic_regex_creator)
75a12bb35 48 89 11        MOV        qword ptr [this],param_1        ; Save regex_data ptr (RDX) to this
75a12bb38 48 8b 42 18     MOV        RAX,qword ptr [param_1 + 0x18]  ; RAX := [regex_data + 0x18]
75a12bb3c 48 89 41 08     MOV        qword ptr [this + 0x8],RAX      ; Save regex_data+0x18 to this
75a12bb40 33 d2           XOR        param_1,param_1
75a12bb42 48 89 51 10     MOV        qword ptr [this + 0x10],param_1
; ... further object initialization ...
75a12bb6d 48 8b 09        MOV        this,qword ptr [this]           ; Replace this (RCX) with saved regex_data
75a12bb70 48 8b 81        MOV        RAX,qword ptr [this + 0x160]
          60 01 00 00
75a12bb77 48 89 81        MOV        qword ptr [this + 0x168],RAX
          68 01 00 00
75a12bb7e 48 8b 03        MOV        RAX,qword ptr [RBX]             
75a12bb81 89 50 2c        MOV        dword ptr [RAX + 0x2c],param_1
75a12bb84 48 8b 43 08     MOV        RAX,qword ptr [RBX + 0x8]       ; RAX is regex_data+0x18 restored from the original basic_regex_creator object
75a12bb88 4c 8d 05        LEA        R8,[u+6]
          1b fd d0 00
75a12bb8f 48 8d 15        LEA        param_1,[w]
          13 fd d0 00
75a12bb96 48 8b 08        MOV        this,qword ptr [RAX]            ; CRASH when reading regex_data+0x18

The 0xbe bytes indicate uninitialized memory with ASAN instrumentation. This means that we don't overwrite memory, but use it uninitialized (and wide chars are just leftover trash)! Uninitialized memory is even more fun to debug, because you are hunting for something that didn't happen :) While it wouldn't directly solve our problem, it would be nice to at least see when those uninitialized bytes were allocated.

This is where rr comes to save the day! rr is a time-travel debugger for Linux, that records (among other things) all memory accesses, and allows us to go back in time to investigate any crimes. This also means that while we are in the replay even heap allocations become predictable so I set a conditional breakpoint for the mocked HeapAlloc() call that breaks only when the resulting buffer is allocated at the same address where the offending regex_data object is observed at the time of crash.

After 12 hits we get our crash again, but this time we can go back to the last relevant(!) HeapAlloc() call and take a look at the backtrace:

(rr) b HeapAlloc if dwBytes==408
...
Program received signal SIGSEGV, Segmentation fault.
0x000000075a12bb96 in ?? ()
(rr) reverse-continue
Continuing.

Breakpoint 1, HeapAlloc (hHeap=0x48454150, dwFlags=0, dwBytes=408) at winapi/Heap.c:35
35          if (dwFlags & HEAP_ZERO_MEMORY) {
(rr) finish
Run till exit from #0  HeapAlloc (hHeap=0x48454150, dwFlags=0, dwBytes=408) at winapi/Heap.c:35
0x000000075a7746cc in ?? ()
Value returned is $1 = (void *) 0x55bf6ac0c640
(rr) bt
#0  0x000000075a7746cc in ?? ()
#1  0x0000000048454150 in ?? ()
#2  0x0000000000000000 in ?? ()
(rr)

Carefully single-stepping from here we end up in a constructor of regex_data (in this particular case we could get here by the static results too, but inheritance can make things tricky, esp. if we don't have symbols that is frequently the case with mpengine.dll):

regex_data<> * __thiscall boost::re_detail_500::regex_data<>::regex_data<>(regex_data<> *this)

{
  regex_traits_wrapper<> *prVar1;
  LCID local_res10 [2];
  regex_traits_wrapper<> *local_res18;

  *(undefined8 *)this = 0;
  *(undefined8 *)(this + 8) = 0;
  *(undefined8 *)(this + 0x10) = 0;
  prVar1 = (regex_traits_wrapper<> *)operator_new(0x10);
  local_res18 = prVar1;
  local_res10[0] = GetUserDefaultLCID();
  object_cache<>::get((ulong *)prVar1,(__uint64)local_res10);
  std::shared_ptr<>::shared_ptr<><>((shared_ptr<> *)(this + 0x18),prVar1);
  *(undefined8 *)(this + 0x28) = 0;
  *(undefined8 *)(this + 0x30) = 0;
  *(undefined8 *)(this + 0x38) = 0;
  *(undefined8 *)(this + 0x40) = 0;
  *(undefined8 *)(this + 0x48) = 0;
  *(undefined4 *)(this + 0x50) = 0;
  memset(this + 0x54,0,0x100);
  *(undefined4 *)(this + 0x154) = 0;
  *(undefined8 *)(this + 0x168) = 0;
  *(undefined8 *)(this + 0x160) = 0;
  *(undefined8 *)(this + 0x158) = 0;
  *(undefined4 *)(this + 0x170) = 0;
  *(undefined8 *)(this + 0x178) = 0;
  *(undefined8 *)(this + 0x180) = 0;
  *(undefined8 *)(this + 0x188) = 0;
  *(undefined2 *)(this + 400) = 0;
  return this;
}

Now that this+0x18 reference looks awfully familiar from basic_regex_creator, and we even see a call to an external API! Here's the mock code:

STATIC DWORD GetUserDefaultLCID()
{
    //value of LOCALE_USER_DEFAULT
    DebugLog("");
    return 0x0400;
}

Do you see? No? That's cool, I didn't see it either. This method is awfully boring, so let's look at some assembly instead:

undefined __thiscall regex_data<>(regex_data<> * this)
                               assume GS_OFFSET = 0xff00000000
             undefined         <UNASSIGNED>   <RETURN>
             regex_data<> *    RCX:8 (auto)   this

75a12bc2c 48 89 4c        MOV        qword ptr [RSP + local_res8],this
          24 08
75a12bc31 53              PUSH       RBX
75a12bc32 55              PUSH       RBP
75a12bc33 56              PUSH       RSI
75a12bc34 57              PUSH       RDI
75a12bc35 48 83 ec 28     SUB        RSP,0x28
75a12bc39 48 8b f1        MOV        RSI,this
...             
75a12bc59 ff 15 11        CALL       qword ptr [->KERNEL32.DLL::GetUserDefaultLCID]
          51 cf 00
...
75a12bc74 48 8d 4e 18     LEA        this,[RSI + 0x18]
75a12bc78 e8 2b 01        CALL       std::shared_ptr<>::shared_ptr<><>
          00 00

The pointer (to offset 0x18 from this) is passed to shared_ptr via some arithmetic on RSI. This register is preserved during WINAPI calls, as we can see in the prologue of this well behaved function, but GetUserDefaultLCID() is not defined as WINAPI! Funnily enough, the clobbered value we get after the API call is also a writable memory address, so shared_ptr doesn't crash, just writes to some unrelated place (also not causing any problems), leaving our object field at +0x18 uninitialized.

Adding the WINAPI macro to the declaration solves this problem - just like it fixed the much more obvious bugs in my new API's a couple of days back...

The Conclusion

If I had to give you one takeaway it'd be this: If you find a bug, look for its relatives!

It's quite frustrating to see such an easy to spot bug causing such a mess, but at least I got to sharpen my debugging skills and train my muscle memory with rr, a tool that I think doesn't get the appreciation it deserves.

If you are interested in loadlibrary we still have a fresh list of API's to mock, providing a great first task for new contributors!