Speeding up wineserver syncronization objects with shared memory

Thu Feb 15 12:34:02 CST 2001

Hi everyone,

We've recently been working on getting American McGee's Alice (a visually stunning game, 
if you haven't seen it before) running well under Wine, and we've run into a serious speed
issue with synchronization objects like Mutexes.

Currently, Alice runs at about 50% the framerate it gets in Windows with the same graphics
driver (NVidia).  When we started investigating, it turned out that the reason for this
is that it's spending half of it's time in the WineServer.  At first we assumed that this
was due to the fact that the GL thunks need to grab the X11 lock.  We realized that this
wasn't necessary for most GL calls if we're using a direct rendering GL implementation, 
and turned off the locks.  There was no effect - because there really wasn't much contention
for the x11 lock.

After going through a number of similar Wine internal possibilities and getting nowhere, 
we finally realized that the problem was the app itself.  It's grabbing and releasing
a mutex of it's own bazillions of times each frame.  Since there's nothing much we can
do about that we started thinking about the proposed linux kernel module approach.
After re-reading the thread and looking over the prototype, I have to concur with 
Alexandre's judgement - the prototype that exists is trying to do too much work.  

After some more thinking, Ove and I have come up with a mechanism that should eliminate 
most of the wineserver overhead for mutexes and semaphores, without the need to resort
to a kernel module.  We're probably going to give this a try over the next few days, so
any feedback will be very much appreciated.

Here's what we've been discussing in private email:

============================================================================================
Ove writes:
> Gav writes:
>
> > Alternatively, I wonder if there's some way to speed up synchronization stuff
> > through the use of some kind of shared memory area that all wine processes know
> > about.  The shared memory area could be used to do mutexes with atomic test-and-
> > set operations.
> 
> Maybe. But we probably don't want extensive busy waits, so we'd need to
> call the wineserver when we need to wait. And the wineserver isn't really
> designed to do bus-locked atomic access to such shared areas itself. But
> perhaps with some client cooperation... in win32, a mutex is just a
> different (and slower) kind of a critical section, anyway (but since it's
> handle-based it can work across address spaces).
> 
> If each mutex had a wcount field shared among all clients, we could do...
> 
> ReleaseMutex:
>  wc = InterlockedDecrement(&wcount)
>  if wc > 0
>   call wineserver's ReleaseMutex
> 
> WaitForSingleObject:
>  wc = InterlockedIncrement(&wcount)
>  if wc < 0
>   return WAIT_OBJECT_0
>  call wineserver's WaitForSingleObject
> 
> which would at least do something about the
> ReleaseMutex/WaitForSingleObject pairs in the same thread...

That's exactly the kind of thing I was thinking about.  We can probably do the 
same for the CriticalSection semaphores as well.  I don't think that we can 
do anything to speed up Events though.  

So the next question is: what's the best way to manage the shared area for 
each mutex/semaphore?  We could just expose the wineserver handle table 
directly in the shared memory area, expanding the handle_entry struct in 
the server with a DWORD to server as the count field.  Theoretically it brings
up security concerns, but I don't think that we care that much at this point.

============================================================================================

Thoughts, anyone?

-Gav

-- 
Gavriel State, CEO
TransGaming Technologies Inc.
http://www.transgaming.com
gav at transgaming.com