[1/10] Check for ms_hook_prologue attribute support, make a first function hookable

Stefan Dösinger stefan at codeweavers.com
Tue Oct 13 16:37:18 CDT 2009


Am 13.10.2009 um 22:50 schrieb Nikolay Sivov:

> Stefan Dösinger wrote:
>> This makes use of the gcc attribute added to gcc svn yesterday: http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00319.html
>> ------------------------------------------------------------------------
> Hi, Stefan. Could you explain me briefly if you have time what this  
> is about? How this hook is about to work?
> I guess it's something like hard wrapping for forwarding api calls  
> to something nondefault, or I'm wrong?
CC'ing Wine-devel, since other people will probably have the same  
question.

Steam has a feature which Valve calls "In Game Overlay". You can  
access certain Steam features, like the chat over voice chat from  
inside the game. This works without modifying the game. Other apps  
like Xfire have a similar feature.

To make the overlay work, Steam injects a DLL into the game before it  
starts the game. It creates the process suspended, allocates remote  
memory, and puts some DLL load code there. Then it changes the  
entrypoint to that allocated area, which calls LoadLibrary and calls  
the GameOverlayRenderer.dll DllMain. Then it calls the original game  
entrypoint. This part works fine already.

Now GameOverlayRenderer.dll has to intercept keyboard input, and it  
has to add its own graphics on top of the game graphics before the  
final result is sent to D3D. To do this, it tries to intercept calls  
like opengl32.wglSwapBuffers, IDirect3DSwapChain9::Present.  
Furthermore it hooks calls like LoadLibrary to find out if the game  
uses D3D9, and CreateProcess to catch children.

Steam's hooker doesn't blindly replace code bytes like other apps do.  
It disassembles the code, and tries to preserve the instruction. It  
tries to free the first 5 bytes and places an unconditional immediate  
jump to its own code there. After this has done its business, it  
executes the replaced opcodes and jumps to the next instruction.

Now Steam doesn't know all possible opcodes. For example, it doesn't  
know

89 e5	mov %esp, %ebp

which is the same as

8b ec	mov %esp, %ebp

generated by MSVC.

Furthermore, it can't deal with the LEA generated on OSX to align the  
stack, and it doesn't know an xor %eax, %eax used to clear the eax  
register, etc.

Starting with WinXP SP2, microsoft added a 2 byte nop at the start of  
each function. It also adds 5 nops above the entrypoint. So many Win32  
API functions look like this on Windows:

90		nop		; cc		int 3 in some functions
90		nop		; cc		int 3
90		nop		; cc		int 3
90		nop		; cc		int 3
90		nop		; cc		int 3
func:
8b ff		movl.s %edi, %edi.
55		pushl %ebp
8b ec	movl.s %esp, %ebp

MS' idea is to replace the 8b ff with a -5 byte jump, and replace the  
5 nops with a JMP dst. They use this for hotpatching, to apply  
security fixes to Windows DLLs without restarting the app. They have  
special compiler and linker switches to create hotpatchable images.

Since Steam makes a fairly reasonable assumption when it assumes that  
a function starts with 8b ff 55 ob ec, Alexandre and I agreed that it  
makes sense to do this for all Wine functions that Steam and other  
apps try to hook. However, the MSVC compile switches are pretty  
obscure. They only add the 8b ff, and they don't work together with / 
O2 (only /O1), and they break if you use the MSVC equivalent of -fomit- 
frame-pointer. So we and the gcc maintainer decided that we don't want  
to clone this feature 1:1 in gcc, instead we added a function attribute.

The ms_hook_prologue attrib will make sure the 5 bytes before and  
after the function entrypoint look as the above code. This happens no  
matter what other options, attributes etc are used. If e.g. -fomit- 
frame-pointer is used, it generates this code:

90 90 90 90 90
func:
8b ff		movl.s %edi, %edi.
55		pushl %ebp
8b ec	movl.s %esp, %ebp
??		popl %ebp
< other code >
ret

The first 5 bytes are marked as unspec volatile, so the optimizer will  
not try to optimize them away, even at -O6. If %ebp is pushed as part  
of <other code> the optimizer realizes this though. So gcc first dumps  
the 8b ff 55  8b ec, and then bothers about reconciling that with what  
the app actually wants.

Currently only the 5 bytes after the function start are generated by  
gcc, it doesn't yet generate the nops. I am working on this, but it  
will take a bit longer because it needs some changes in the backend- 
frontend interaction when generating function alignment. Furthermore  
there's a bug in binutils that when you explicitly align a function  
with .align X, 0x90 (ie, you request that 0x90 bytes are used for  
alignment) it will optimize those nops into a single 3, 4, 5, ... byte  
nop.





More information about the wine-devel mailing list