[1/10] Check for ms_hook_prologue attribute support, make a first function hookable
Stefan Dösinger
stefan at codeweavers.com
Tue Oct 13 16:37:18 CDT 2009
Am 13.10.2009 um 22:50 schrieb Nikolay Sivov:
> Stefan Dösinger wrote:
>> This makes use of the gcc attribute added to gcc svn yesterday: http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00319.html
>> ------------------------------------------------------------------------
> Hi, Stefan. Could you explain me briefly if you have time what this
> is about? How this hook is about to work?
> I guess it's something like hard wrapping for forwarding api calls
> to something nondefault, or I'm wrong?
CC'ing Wine-devel, since other people will probably have the same
question.
Steam has a feature which Valve calls "In Game Overlay". You can
access certain Steam features, like the chat over voice chat from
inside the game. This works without modifying the game. Other apps
like Xfire have a similar feature.
To make the overlay work, Steam injects a DLL into the game before it
starts the game. It creates the process suspended, allocates remote
memory, and puts some DLL load code there. Then it changes the
entrypoint to that allocated area, which calls LoadLibrary and calls
the GameOverlayRenderer.dll DllMain. Then it calls the original game
entrypoint. This part works fine already.
Now GameOverlayRenderer.dll has to intercept keyboard input, and it
has to add its own graphics on top of the game graphics before the
final result is sent to D3D. To do this, it tries to intercept calls
like opengl32.wglSwapBuffers, IDirect3DSwapChain9::Present.
Furthermore it hooks calls like LoadLibrary to find out if the game
uses D3D9, and CreateProcess to catch children.
Steam's hooker doesn't blindly replace code bytes like other apps do.
It disassembles the code, and tries to preserve the instruction. It
tries to free the first 5 bytes and places an unconditional immediate
jump to its own code there. After this has done its business, it
executes the replaced opcodes and jumps to the next instruction.
Now Steam doesn't know all possible opcodes. For example, it doesn't
know
89 e5 mov %esp, %ebp
which is the same as
8b ec mov %esp, %ebp
generated by MSVC.
Furthermore, it can't deal with the LEA generated on OSX to align the
stack, and it doesn't know an xor %eax, %eax used to clear the eax
register, etc.
Starting with WinXP SP2, microsoft added a 2 byte nop at the start of
each function. It also adds 5 nops above the entrypoint. So many Win32
API functions look like this on Windows:
90 nop ; cc int 3 in some functions
90 nop ; cc int 3
90 nop ; cc int 3
90 nop ; cc int 3
90 nop ; cc int 3
func:
8b ff movl.s %edi, %edi.
55 pushl %ebp
8b ec movl.s %esp, %ebp
MS' idea is to replace the 8b ff with a -5 byte jump, and replace the
5 nops with a JMP dst. They use this for hotpatching, to apply
security fixes to Windows DLLs without restarting the app. They have
special compiler and linker switches to create hotpatchable images.
Since Steam makes a fairly reasonable assumption when it assumes that
a function starts with 8b ff 55 ob ec, Alexandre and I agreed that it
makes sense to do this for all Wine functions that Steam and other
apps try to hook. However, the MSVC compile switches are pretty
obscure. They only add the 8b ff, and they don't work together with /
O2 (only /O1), and they break if you use the MSVC equivalent of -fomit-
frame-pointer. So we and the gcc maintainer decided that we don't want
to clone this feature 1:1 in gcc, instead we added a function attribute.
The ms_hook_prologue attrib will make sure the 5 bytes before and
after the function entrypoint look as the above code. This happens no
matter what other options, attributes etc are used. If e.g. -fomit-
frame-pointer is used, it generates this code:
90 90 90 90 90
func:
8b ff movl.s %edi, %edi.
55 pushl %ebp
8b ec movl.s %esp, %ebp
?? popl %ebp
< other code >
ret
The first 5 bytes are marked as unspec volatile, so the optimizer will
not try to optimize them away, even at -O6. If %ebp is pushed as part
of <other code> the optimizer realizes this though. So gcc first dumps
the 8b ff 55 8b ec, and then bothers about reconciling that with what
the app actually wants.
Currently only the 5 bytes after the function start are generated by
gcc, it doesn't yet generate the nops. I am working on this, but it
will take a bit longer because it needs some changes in the backend-
frontend interaction when generating function alignment. Furthermore
there's a bug in binutils that when you explicitly align a function
with .align X, 0x90 (ie, you request that 0x90 bytes are used for
alignment) it will optimize those nops into a single 3, 4, 5, ... byte
nop.
More information about the wine-devel
mailing list