Chapter 3. Other debugging techniques

3.1. Understanding undocumented APIs

Some background: On the i386 class of machines, stack entries are usually dword (4 bytes) in size, little-endian. The stack grows downward in memory. The stack pointer, maintained in the esp register, points to the last valid entry; thus, the operation of pushing a value onto the stack involves decrementing esp and then moving the value into the memory pointed to by esp (i.e., push p in assembly resembles *(--esp) = p; in C). Removing (popping) values off the stack is the reverse (i.e., pop p corresponds to p = *(esp++); in C).

In the stdcall calling convention, arguments are pushed onto the stack right-to-left. For example, the C call myfunction(40, 20, 70, 30); is expressed in Intel assembly as:

push 30
push 70
push 20
push 40
call myfunction
The called function is responsible for removing the arguments off the stack. Thus, before the call to myfunction, the stack would look like:
         [local variable or temporary]
         [local variable or temporary]
esp ->    40
After the call returns, it should look like:
         [local variable or temporary]
esp ->   [local variable or temporary]

To restore the stack to this state, the called function must know how many arguments to remove (which is the number of arguments it takes). This is a problem if the function is undocumented.

One way to attempt to document the number of arguments each function takes is to create a wrapper around that function that detects the stack offset. Essentially, each wrapper assumes that the function will take a large number of arguments. The wrapper copies each of these arguments into its stack, calls the actual function, and then calculates the number of arguments by checking esp before and after the call.

The main problem with this scheme is that the function must actually be called from another program. Many of these functions are seldom used. An attempt was made to aggressively query each function in a given library (ntdll.dll) by passing 64 arguments, all 0, to each function. Unfortunately, Windows NT quickly goes to a blue screen of death, even if the program is run from a non-administrator account.

Another method that has been much more successful is to attempt to figure out how many arguments each function is removing from the stack. This instruction, ret hhll (where hhll is the number of bytes to remove, i.e. the number of arguments times 4), contains the bytes 0xc2 ll hh in memory. It is a reasonable assumption that few, if any, functions take more than 16 arguments; therefore, simply searching for hh == 0 and ll < 0x40 starting from the address of a function yields the correct number of arguments most of the time.

Of course, this is not without errors. ret 00ll is not the only instruction that can can have the byte sequence 0xc2 ll 0x0; for example, push 0x000040c2 has the byte sequence 0x68 0xc2 0x40 0x0 0x0, which matches the above. Properly, the utility should look for this sequence only on an instruction boundary; unfortunately, finding instruction boundaries on an i386 requires implementing a full disassembler -- quite a daunting task. Besides, the probability of having such a byte sequence that is not the actual return instruction is fairly low.

Much more troublesome is the non-linear flow of a function. For example, consider the following two functions:

    jmp  somefunction1_impl

    ret  0004

    ret  0008
In this case, we would incorrectly detect both somefunction1 and somefunction2 as taking only a single argument, whereas somefunction1 really takes two arguments.

With these limitations in mind, it is possible to implement more stubs in Wine and, eventually, the functions themselves.