Lingua Diabolis

May 09, 2025

Fuzzing Windows Defender with loadlibrary in 2025

As I showed in the previous post, I recently started to put considerable effort into making loadlibrary work with recent versions of Windows Defender. I'm glad to report that this week I managed to put together a proof-of-concept fuzzing setup driving version 1.1.25020.1007 of mpengine.dll! This post is my brain dump of the process so far. The first part is about the Lua VM that is probably less interesting when it comes to memory corruptions, but if red teamers pay close attention, they may find some useful techniques and ideas ;) The second part is focused on fuzzing, so if you are only interested in that topic, feel free to skip ahead using the TOC!

These cute little alligators will hopefully make the fold work right :P

Code in this post is from version 1.1.24090.11 of mpengine.dll, fuzzing results will be presented with 1.1.25020.1007.

Table of Contents

Inside Defenders Lua VM

At the time of my last post, we got to a point when we could make mpclient_x64 detect the EICAR test file. After a short happy dance, I ran another test with an old malware sample collection in a RAR archive. This way I could test more complex detections and - most importantly - exercise some of the unpacker functionality of Defender. Unpacker components - used in AV engines to strip layers of compression, executable packing, etc. - are prone to vulnerabilities due to their complexity and often subpar quality of their third-party components.

This second test provided a troubling result: instead of a segmentation fault - that is usually easy to trace back to a missing mock API - I got an unhandled C++ exception that ended up trapping in mpclient. I traced back the exception using rr, and found that it was thrown from the Lua virtual machine embedded inside mpengine.dll:

void __cdecl luaV_gettable(lua_State *state,lua_TValue *t,lua_TValue *key,lua_TValue *val){
//...
if ((lua_TValue *)callinfo[2] <= key_scalar_val) {
    luaG_runerror(state,"attempt to %s a %s value","index",type_name); // "attempt to index a nil value", throws exception
    pcVar7 = (code *)swi(3);
    (*pcVar7)();
    return;
}
//...
}

A large part of Defender's signatures is implemented in Lua, most likely to support rapid development of complex signature logic. The format of signatures (distributed in .VDM files) was reverse engineered, and multiple versions of extraction tooling is available - I couldn't track the history of this research, some notable resources are:

Additionally, Making a Lua Bytecode parser in Python by cpunk was the most helpful resource for me to get up to speed with Lua bytecode. This writeup led me to the following additional Lua resources:

I used commial's scripts to uncompress the .VDM content. This way Lua bytecode became directly visible, but headers expected by e.g. luadec weren't fully present (Lua bytecode chunks can still be easily identified by the '\x1bLua' magic value). This result was good enough to identify the failing bytecode based on the bytecode processed right before the exception. Initially I used a partial bytecode sequence cube0x8 extracted by peeing into the execution of LuaStandalone::AddScript(). Later I tapped into luaV_execute(), the main execution loop of the VM to observe the execution of instructions individually.

foo

luaV_execute()'s control-flow graph, generated by function-graph-overview

As you can see on the above CFG, this function is a true beast. Fortunately not only can we access public symbols from Microsoft (usually with a couple of weeks delay after a new engine version is released), but can also match the compiled luaV_gettable() function closely with Lua 5.1.5's official source (although there's lots of inlining and later I also found functions that should be deprecated in 5.1). I ended up hooking the following code in luaV_execute():

luaV_execute():
; ...
75a16bbdc 4c 8b f1        MOV        R14,RCX ; R14 := lua_state (first argument for luaV_execute) 
75a16bbdf 49 8b 46 28     MOV        RAX,qword ptr [R14 + 0x28]
75a16bbe3 4d 8b 66 30     MOV        R12,qword ptr [R14 + 0x30] ; R12 := state->savedpc
; ...
75a16bc07 41 8b 1c 24     MOV        EBX,dword ptr [R12] ; EBX := Lua instruction (4 bytes)

LuaPytecode was useful at this point, because it could be easily hacked to ignore missing headers and print out any successfully parsed opcodes - the exception happened during parsing of OP_SELF:

VM version 51
instructions: 21
00000005  GETGLOBAL :   R[0]   K[0]
00404006   GETTABLE :   R[0]      0   K[1]
00008045  GETGLOBAL :   R[1]   K[2]
00C0C046   GETTABLE :   R[1]      1   K[3]
00008085  GETGLOBAL :   R[2]   K[2]
01410086   GETTABLE :   R[2]      2   K[4]
0100005C       CALL :      1      2      0
0000801C       CALL :      0      0      2
0041400B       SELF :   R[0]      0   K[5]
...

Unfortunately without symbols it's hard to tell what exactly is being called, but to my slight surprise this script still can restore "standard" Lua headers from .VDM contents, so we can get all the symbols with their indexes:

0: [STRING] MpCommon
1: [STRING] PathToWin32Path
2: [STRING] mp
3: [STRING] getfilename
4: [STRING] FILEPATH_QUERY_FULL
5: [STRING] lower
6: [STRING] find
...

... and even use luadec:

local l_0_0 = ((MpCommon.PathToWin32Path)((mp.getfilename)(mp.FILEPATH_QUERY_FULL))):lower()

Why would you need code like this in a malware signature? 🤔

Based on the bytecode and symbols we can see that the call to lower() is failing (In Lua everything is a table, aka. a key-value store. OP_SELF references the K[5] key, symbol #5 is "lower"). This means that MpCommon.PathToWin32Path must return nil. This can be further confirmed by breaking in the failing luaV_gettable() call and dumping the Lua string:

Breakpoint 1, 0x000000075a164ba0 in ?? ()                 
              ^ luaV_gettable()

(rr) x/2gx $r8 
0x7fde6bede9cf: 0x00007fde718c61b5      0x0000000000000004 
                ^ lua_TValue pointer    ^ lua_TValue type (4=string)

(rr) x/s *((void**)$r8)+0x18                              
0x7fde718c61cd: "lower"                                   
                ^ ASCII value starts at 0x18

The MpCommon and mp tables are of course defined by mpengine.dll to provide an interface to some of the engines native functionality. For example, MpCommon.PathToWin32Path references the native LsaMpCommonLib::PathToWin32Path() function inside mpengine.dll. This function normalizes the filename mp.getfilename() (implemented by lua_mp_getfilename()) could return in a variety of formats. After spending what felt like an eternity on trying to find any of our mocked WINAPI calls on this code path that could make PathToWin32Path() fail, I realized that while this function is only supposed to work with absolute paths, mpclient_x64 (not the mock API!) always provides "input" as the input file name - why would that matter, right? Well, adding a drive letter prefix to the string instantly made mpclient_x64 detect a bunch of malware in the RAR archive!

Unfortunately now I ended up with another C++ exception... Repeating the previously described process, I ended up with this failing Lua code:

local l_0_0 = (mp.readfile)((pe.foffset_rva)(pehdr.AddressOfEntryPoint) - 918, 768)

Problem was that pe.foffset_rva (implemented by lua_pe_foffset_worker()) returned a value less than 918 for the PE entry point offset, making mp.readfile (implemented by lua_mp_readfile()) fail because of the negative parameter. This is significant, because - as Ange Albertini helpfully confirmed - nothing guarantees that the entry point offset of a PE file is greater than 918, so this Lua script will fail for a lot of inputs (gracefully, there is likely a proper exception handler in the original Defender).

At this point it seemed best to simply cut out Lua from the equation: while the VM implementation and the native functions provided for it can certainly be buggy, creating exploits for such bugs without controlling the bytecode (signature files) seems like a pretty long shot. A Galaxy Brain solution is of course to create malware for which MS will create Lua code that sets the stage for the real exploit just right ;)

In the next section I'll present a simple fuzzing setup that is capable of driving the latest 64-bit mpengine.dll version on Linux and exercising unpacker logic while not wasting (too many) cycles on Lua execution.

Fuzzing

Based on the previous sections conclusions, fuzzing with the Lua VM would make our target very unstable while wasting resources on a less attractive attack surface, so let's get rid of it! Thankfully, loadlibrary includes code for hooking that cube0x8's x64 branch improves on for 64-bit support. With this, we can just replace luaV_execute() with a function that does nothing:

void my_lua_exec(){
  return;
}

int main(int argc, char** argv){
  // ...
  insert_function_redirect((void*)luaV_execute_address, my_lua_exec, HOOK_REPLACE_FUNCTION);
  // ...
}

This change broke the detection of even the simplest EICAR pattern, but output indicated that unpackers work:

$ ./mpclient_x64 ../eicar.7z 2>&1 | fgrep Callback     
pelinker (import:285): unknown symbol: KERNEL32.dll:FreeLibraryWhenCallbackReturns
WaitForThreadpoolTimerCallbacks(): 0x41414141, 1                                  
WaitForThreadpoolTimerCallbacks(): 0x41414141, 1                                  
WaitForThreadpoolTimerCallbacks(): 0x41414141, 1                                  
WaitForThreadpoolTimerCallbacks(): 0x41414141, 1                                  
EngineScanCallback(): Scanning C:\mpclient.input                                  
EngineScanCallback(): Packer nUFS_7z identified.                                  
EngineScanCallback(): Scanning C:\mpclient.input->eicar.com                       

Before moving forward with integrating the fuzzer, I also wanted to try how hard it is to adopt our improved mpclient_x64 to the latest version of mpengine.dll. So far I used a 2024 version of the DLL because more recent ones crashed during initialization - now was time for debugging that problem. The issue was with Defenders local cache: a bunch of files starting with the mpcache- prefix, located under C:\ProgramData\Microsoft\Windows Defender\Scans on a Windows system. While in 2024 the engine didn't complain about this, the 2025 version not only wanted to open these files, but attempted to map them to memory and check their contents. The change turned out to be in the dwCreationDisposition flags in the CreateFile call used for opening the cache files: earlier our mock API returned a NULL handle while with the new flags it returned a handle to /dev/null. Tweaking our implementation to return NULL if the filename includes "mpcache-" made the engine bail out from cache initialization and resume normally.

With this somewhat stable, up-to-date implementation we can try to execute a really simple fuzzing run:

afl-fuzz -Q -i ~/input/ -o ~/output/ -t 17000  -- ./mpclient_x64 @@

Our first attempt is with AFL++ QEMU mode - we need a binary-only instrumentation method as we can't recompile mpengine.dll... The seed was a single zipped EICAR file. Timeout had to be increased significantly: bare metal execution itself is pretty slow, and emulation increased execution time significantly. Note that this is non-persistent mode, so we could easily reduce this number by running engine initialization only once. The bigger problem is that QEMU didn't seem to provide coverage information. I haven't investigated this, my theory is that the weird loader or the mixed calling conventions could be the cause of the problem.

Sad AFL++

Given my past experience, I decided to bring out the big guns and do hardware-based coverage tracking with Intel PT. For this I switched to honggfuzz that I remembered to be easy to setup with IPT, and I wasn't disappointed:

../honggfuzz/honggfuzz -i ~/input/ -W ~/workspace/  --linux_perf_ipt_block -t 10 -- ./mpclient_x64 ___FILE___

Non-persistent honggfuzz execution with IPT

IPT seems to collect coverage data as expected, and while performance isn't great (~4 s/execution), it's already better than QEMU! What if we implemented persistent mode? Honggfuzz offers two ways to do this: ASAN-style and HF_ITER-style. Since our build scripts rely on GCC-specific compiler flags we can't immediately use ASAN-style, but HF_ITER fits the original structure of mpclient_x64 better anyway: all we have to do is to load input data by calling the HF_ITER() function of the fuzzer, then feed this data to mpengine's __rsignal(). A minor issue is that __risgnal() expects a stream handle (FILE* when called from mpclient_x64) instead of a pointer to a byte array. This can be easily resolved by using the fmemopen library function that can open a stream to a memory buffer. The persistent fuzzer loop looks something like this:

    for (;;) {
        size_t len;
        uint8_t *buf;
        HF_ITER(&buf, &len);

        ScanDescriptor.UserPtr = fmemopen(buf, len, "r");

        if (__rsignal(&KernelHandle, RSIG_SCAN_STREAMBUFFER, &ScanParams, sizeof ScanParams) != 0) {
          // ...
        }
        //...
    }

To confirm that the solution actually works I started to monitor execution output using the -Q option of honggfuzz:

Fuzzed output

We can see that the filename extracted from the seed archive changes frequently, confirming that different inputs hit the ZIP unpacker. On the performance side, eliminating initialization resulted in radical improvement, achieving hundreds of executions per second:

Persistent fuzzing with honggfuzz and IPT

I consider this a success: two months ago the 2025 engine wouldn't even initialize and even older engines crashed on the simplest of inputs. Now we are able to get stable results for millions of fuzzer runs (I haven't executed a longer campaign yet) with the latest 64-bit engine. I also hope that with these writeups future maintenance work becomes easier and more accessible for new contributors.

There are of course a lot of problems with this proof-of-concept setup - some examples from the top of my head:

  • The corpus generated by honggfuzz is a bit weird, e.g. it includes long, meaningful strings that can't occur randomly, the source of these must be investigated to out-rule bugs in the fuzzing setup.
  • A floating point exception occurs frequently as inputs get more complex, this needs to be eliminated to reduce noise (last minute update: this comes from the .NET emulator, sounds like a follow-up topic!).
  • mpclient_x64 is still wasting considerable resources on debug logging, sub-optimal Lua engine hooking, etc.

These problems look manageable and are mostly part of any fuzzing project. Next Patch Tuesday will be a good test of the difficulty of adjusting mpclient_x64 to engine updates.

Relevant code is available in this Git branch. Contributions - including bug reports - are welcome as always!

Thanks to WaffleSec for reviving the work on loadlibrary, and even bigger thanks to cube0x8 for continuous technical (and emotional :)) support that made this project come together!

PSA: I'm open for work. If this kind of research (or other stuff I do) is interesting to you, feel free to reach out!