RFC: Hybrid (wineserver / process-local) native semaphores

Thu Sep 10 01:07:48 CDT 2015

This patch set fixes most performance problems caused by 
ReleaseSemaphore() and WaitForSingle/MultipleObject(s) making server 
calls. One victim of this is Star Wars Battlefront 
(https://bugs.winehq.org/show_bug.cgi?id=29582), where the majority of 
the CPU time is spent on context switching while the program spams 
ReleaseSemaphore, WaitForSingle/MultipleObject(s) and 
GetForgroundWindow, the later of which I made a work-around hack for and 
that some players have been using.

This patch set doubles performance of bug #29582. (When combined with 
GetForgroundWindow hack the problem is completely resolved.) The patch 
set works by having the server create a POSIX semaphore object and 
sharing the key to that object with the client process, enabling the 
client process to be able to implement ReleaseSemaphore and 
optimistic-case wait calls (where no blocking is reburied) without a 
server call. Blocking waits and any wait-multiple that cannot be 
resolved in the client process (e.g., bWaitAll=TRUE and objects include 
non-semaphores)  is still handled by the server. (Implementing blocking 
wait calls on the client can yield some performance improvements because 
a context switch to another thread in the same program won't require 
swapping out the memory map & such, but I would expect this to be less 
significant.)

However, upon further experimentation, I discovered that POSIX 
semaphores in glibc are actually implemented using a shared memory page, 
which may not be acceptable since a bad process can corrupt that page 
and potentially cause sem_* function calls in the server to fail as well 
as other client programs fail and/or deadlock. I am working on a System 
V adaptation, but I thought it would be a good idea to see feedback & 
comments now.

Another problem is that this causes the threadpool test fails at line 
1299, where the previous "release all semaphores and wait for callback" 
test is done in reverse order. I presume this is due to the nature of 
the linux scheduler being inconsistent with how Windows *happens* to 
schedule its threads. I have an idea for a fix for this already, but I 
will still have to dig deeper into it.

The code is still in experimental quality (assert(0)s and such) and I've 
already re-worked the configure.ac stuff, I'm mostly concerned with 
feedback on the general scheme.

Thanks!
Daniel