In part three of this series, we discovered and traced a memory corruption bug in WinAPRS using IDA Pro and WinDbg. We discovered that it could be used to gain control over the CPU’s EIP register to obtain remote code execution. We found that there were limitations on the address that could be placed in the EIP register and therefore chose to target Windows XP to ensure we could build a working proof of concept and put our recently learned exploit development skills into practice. This installment will review a three-stage shellcode payload that overcomes more limitations due to a corrupted stack to ultimately spawn a reverse shell over ham radio.
This buffer overflow vulnerability overwrites lots of stack memory. This resulted in important pointers being overwritten with garbage. As a result, I couldn't call many Win32 APIs such as CreateProcessA, ShellExecuteA, and many others. Making calls to these APIs resulted in heap errors that caused the shellcode to crash and fail. I couldn’t find a single usable Win32 API that could execute shell commands. Building a functioning payload was a long process of trying different Win32 APIs to see which ones worked and then figuring out a combination that would result in the desired outcome: a reverse shell via ham radio. It also had to fit in the 783 bytes I had available in my exploit packet.
What finally ended up working was a three-stage payload delivered in two KISS packets. The first packet contained stages one and two combined into a single payload. Stage one's primary job was to inject stage two into a different process with clean stack memory. That way stage two could call the Win32 APIs that failed in the WinAPRS process after the overflow. The outside process would have clean memory space and would be able to call the APIs that the corrupted WinAPRS memory could not.
Stage One Shellcode
The first part of the stage one shellcode adjusts the stack pointers to make room for variables needed later in the shellcode. It then writes some data to a structure in memory that gets corrupted by the overflow. It essentially fixes the structure with known-good values determined by reading the memory address manually while WinAPRS is running normally. If this structure is corrupted, then later calls to a few Win32 APIs will fail and the shellcode will not function. This block of code repairs the corruption caused by the overflow to ensure those APIs will execute.
Lines 38 - 108 are functions that are used to locate Win32 API pointers in memory for use later in the shellcode. This is something I learned from taking Offensive Security's new OSED course, which I highly recommend if you are new to this sort of thing and want to learn more about it. I won't review all the code here, but just know that you can hardcode a hash value that represents a Win32 API method, and these functions will find the address of the method so your shellcode can utilize it later.
The next section of shellcode calls the lookup function nine times to resolve the addresses of nine different Win32 API methods such as OpenProcess and VirtualAllocEx. These are all used later in the shellcode to inject stage 2 into a separate clean process.
The next code block calls the Win32 CloseHandle function to close the file handle on the COM port. This frees it up for use in stage two later. On Windows XP, WinAPRS seems to always have a handle to the COM port available at a specific memory address which does not get overwritten by the overflow. This means the shellcode can pull the handle from that spot each time to close the COM port.
The stage two shellcode needs to be injected into a separate clean process. One process that should always exist is explorer.exe. Therefore, stage one needs a way to identify the process ID of explorer.exe. The next code block does just that by using three Win32 APIs.
First, it calls CreateToolhelp32Snapshot to take a snapshot of all running processes. Then, it calls Process32First to read the first entry in the resulting list. This entry won't be explorer.exe, so it is ignored. Next, the shellcode calls Process32Next repeatedly until it finds a process with the name “explorer.exe”. Once that is found, it reads the PID from the return value and saves it for later.
Now that the PID of explorer.exe is known, the shellcode can obtain a handle to the process. This is accomplished with the Win32 OpenProcess API. The PID is used as an argument to the function. The resulting handle is saved to the stack for later.
Next, the shellcode uses the Win32 VirtualAllocEx API to allocate a new chunk of writable and executable memory in the remote explorer.exe process. VirtualAllocEx returns the memory address of the new chunk, which is saved to the stack for later.
Once new memory is allocated in explorer.exe, the shellcode calls the WriteProcessMemory API to copy the stage two shellcode to the new address. Although it first overwrites several DWORDs in stage two, due to the limited space available in the first exploit packet, there's not enough room to include the Win32 API lookup functions in both stage one and stage two. Therefore, stage two will need to know the addresses of its required Win32 functions some other way.
WinAPRS makes use of many Win32 APIs. The addresses of those used APIs are placed in WinAPRS' import table. The WinAPRS import table addresses never change between reboots, or even between OS versions. So even if the Win32 API addresses change, they can always be found at the same address in WinAPRS' import table. In the example below, the CreateFileA address will be stored at memory location 0x00760570.
The stage one shellcode copies the addresses from WinAPRS' import table and writes them into the stage two shellcode in specific locations. The stage two shellcode can then use these later without having to look them up. This saves space by not having to copy the many lines of lookup function code into stage two. This will become clear when we review the stage two shellcode next.
Once the stage two shellcode is copied into explorer.exe's process memory, we need to execute it. The stage one shellcode uses the CreateRemoteThread Win32 API to accomplish this task. It passes in the explorer.exe process handle and memory address and tells the process to spawn a new thread from there. At this point, the stage two shellcode should begin to execute inside the explorer.exe process with clean memory.
Finally, the stage one shellcode calls the TerminateProcess Win32 API. This gracefully closes WinAPRS instead of causing a crash, which may be more suspicious.
Stage Two Shellcode
Now that stage one is complete, it's time for stage two to take over. Stage two’s job would normally be to create a reverse shell, but there wasn’t enough space left to do that. Instead, it’s purpose is to read in the final third stage via ham radio and execute that to spawn the shell.
The first thing that happens, is the stack is adjusted to make room for variables. Then, six DWORDs are moved to known locations on the stack. These DWORDS begin as 0xffffffff but are overwritten by the stage one shellcode with memory addresses to various Win32 API functions. This way, the rest of stage two can utilize these APIs without having to waste a bunch of space with lookup functions.
Next, the shellcode needs a way to obtain the stage three shellcode. Since all of this is happening via ham radio, we can't assume that the victim machine has an internet connection. We want to send stage three over the radio. That means that the shellcode needs to open the COM port, though we don't know which COM port is the right one. The next chunk of shellcode attempts to open the COM port using the Win32 CreateFileA API. It starts with COM1. If that fails, it tries COM2, and so on. If it fails after COM9, it will just keep on going in an infinite loop. Hopefully the victim TNC is attached to the lowest available COM port.
Next, we need to be sure the COM port is configured for the correct baud rate. This will depend on each victim machine, but the shellcode is hard coded to configure the COM port to 9600 baud, which is what my TNC uses. In theory, this step shouldn't be necessary. The COM port should maintain whatever setting WinAPRS had configured. In practice, I found that sometimes COM port operations didn't always work until I performed this step, so it's included for reliability.
The Win32 GetCommState API is called, which returns a structure containing the COM port's current configuration. The DWORD containing the baud rate setting is overwritten. Then SetCommState is called to update the configuration with the correct settings.
Now that the COM port is configured correctly, the shellcode uses the Win32 ReadFile API to read one byte at a time from the COM port. It continues reading until it detects two 0xC0 characters. These are the KISS control characters which denote the beginning and end of the KISS packet. After it detects two 0xC0 characters, the shellcode knows that it has obtained the entire packet.
The shellcode then closes the COM port handle to free up the COM port for stage three.
Finally, it jumps to the buffer containing the newly acquired stage three shellcode. It skips the first 26 bytes which include the AX.25 addressing data.
Stage Three Shellcode
The third stage of the payload contains the actual reverse shell code. This code is too long to fit in stage two, which is why stage two merely reads in the third stage and jumps to it. With all the extra space, stage three begins with the same Win32 lookup functions that stage one had. It then performs its own lookups for the Win32 APIs it will use later.
Next, it creates named pipes which will be used later with cmd.exe to read output from the terminal and write user input to it. This is done with the CreatePipe API.
The shellcode next needs to call the CreateProcessA API, though one of the arguments for that function is a STARTUPINFOA structure. The shellcode therefore builds this structure on the stack, which includes handles to two of the pipes it just created. This API call launches a new cmd.exe process and sends the input and output to the created pipes.
Next, two unnecessary handles are closed using the Win32 CloseHandle API.
The shellcode needs a place to store the input and output, so VirtualAlloc is called to allocate some memory.
The next code block is reused from stage two, and attempts to open all COM ports again, starting with COM 1 and working its way up until it finds one that works. It also reconfigured the baud rate again for reliability.
Next is where the main remote shell loop happens. The shellcode calls the Win32 Sleep API to wait for one second for the shell command to complete. Then it reads data from the cmd.exe output pipe into a buffer.
KISS control characters are added to the buffer, and that final KISS packet is written to the serial port using the WriteFile API. At this point, the cmd.exe output should be transmitted over the airwaves back to the attacker.
The next code block uses the ReadFile API to read from the serial port until two KISS control characters are detected. This is essentially the same code that stage two used to read stage three. This time, the buffer will contain shell commands from the attacker.
The shellcode then adds carriage return characters to the attacker's commands and writes them to the cmd.exe process's input pipe. This allows the attacker to execute shell commands as if they were sitting in front of the computer (though with some limitations).
The shellcode then loops forever so the attacker can enter as many commands as they like.
At this point we have an exploitable memory corruption vulnerability. We have an exploitable target system running Windows XP SP3. We have shellcode that will hopefully spawn a reverse shell over ham radio using the victim machine’s TNC. The last step is to incorporate the shellcode into a final exploit script to run on our attacking machine. In the next installment, we’ll do just that.