Optimizing synchronization objects
Daniel Santos
daniel.santos at pobox.com
Sun Sep 13 17:47:44 CDT 2015
Alexandre,
First off I apologize for the size of this email, I'm trying to keep it
as concise as possible.
I've been experimenting with ways to optimize synchronization objects
and have implemented a promising proof of concept for semaphores using
glibc's nptl (posix) semaphore implementation. I posted revision 3 of
this today, although I appear to have used the wrong msg id in the
--in-reply-to header. :( So my goal is to eventually make similar
optimizations for all synchronization objects, or at least those that
have demonstrable performance problems.
The basic theory of operation is that when a client sends a
create_semaphore, the server creates a posix semaphore with a unique
name which is passes to the client process so that it can open it
locally. This allows the client to perform ReleaseSemaphore without a
server call as well as WaitFor(Multiple|Single)Object(s) for cases where
the wait condition can be determined to be satisfied without a server
call (i.e., either bWaitAll = FALSE and a signalled semaphore is found
in the handle list prior to a non-semaphore objects or bWaitAll = TRUE
and all handles are signalled semaphores). For all other conditions, it
uses a traditional server call.
However, it has two problems:
1. It uses glibc's implementation of POSIX semaphores which uses shared
memory to share them with other processes, and
2. It uses glibc's implementation of POSIX semaphores which are
incompatible across 32- and 64-bit ABI processes.
I have not been able to find any more flaws in a case where both program
and wineserver are the same ABI. All tests pass and I've added one more
(although more tests are clearly needed). Since this implementation only
uses sem_trywait (and never sem_wait or sem_timedwait), we don't really
even need a full-featured semaphore -- a simple 32- or 16-bit number
that's accessed atomically would suffice as a replacement. Although I
did plan to eventually explore having a client program block w/o calling
the server, the benefit of that is minimal compared to the benefit of
being able to avoid the server call for releasing a semaphore and
"wait"ing when the semaphore is already available.
So now I want to understand the minimum threshold of acceptability in
wine for such a mechanism. We discussed this in chat and quite a bit and
can I see many possibilities, each with its own particular issues. I'm
listing them in order of my personal preference (most preferred first).
Option 1: Simple shared memory & roll our own semaphore
Similar to what glibc's NPTL semaphores are doing, except that we would
only need a single integral value and not even a futex. The obvious
downside is that a process can corrupt this memory and cause dysfunction
of other processes who also have semaphores in that page. This could be
minimized by giving every process their own page that is only shared
between the server and the process unless a semaphore in that process is
shared with another program, at which time the memory page could be
shared with that process as well. Thus, the scope of possible corruption
is limited to how far you share the object.
In the worse case of memory corruption, the wineserver would either
leave a thread of one of these processes hung, release one when it
shouldn't be released or determine that the memory is corrupted, issue
an error message, set the last error to something appropriate and return
WAIT_FAILED.
Option 2: System V semaphores
On Linux, these are hosted in the kernel, so you can't just accidentally
overwrite them. They will be slightly slower than shared memory due to
the system call overhead. You probably know them better than I, but at
the risk or stating the obvious, the following are their limitations.
Their max value on Linux is SHRT_MAX so any request for a higher
lMaximumCount would have to be clipped. There are also limits on Linux
that can be adjusted by root if needed for some application, but the
default is a maximum of 32000 (SEMVMX) total semaphores on the system,
128 (SEMMNI) semaphore sets and a max size of 250 (SEMMSL) semaphores
per set. They are also persistent, so if the wine server crashes, they
can leave behind clutter.
Option 3: Move semaphores completely into the client
In this scenario, the wine server can never be exposed to corrupted
data. It is very fast when locking can be performed in the client, but
very complicated and potentially slower for mixed locks. Calls to
WaitForMultipleObjectsEx containing both semaphores and other objects
(especially with bWaitAll = TRUE) may require multiple request/reply
cycles to complete. The client must successfully lock the semaphores
prior to the server calling satisfied on the server-side objects.
Here is an optimistic use case that only requires a single request/reply
cycle
1. WaitForMultipleObjectsEx is called with bWaitAll = TRUE and a mix of
semaphores and other objects
2. Client calls trywait on all semaphores, which succeeds.
3. Client passes request to server (with semaphore states) and blocks
on pipe
4. Server gets value of all server-side objects and determines that the
condition can be satisfied, so calls satisfied on all objects
5. Server sends response to client
6. Client wakes up and completes the wait call.
Here is a slightly less optimistic case
1. WaitForMultipleObjectsEx is called with bWaitAll = TRUE and a mix of
semaphores and other objects
2. Client calls trywait on all semaphores, which fails on one semaphore.
3. Client rolls back locks on all which had succeeded.
4. Client passes request to server (with semaphore states)
5. Client blocks on the semaphore that was locked and the server pipe
6. Server updates thread status (blocking on native object)
7. Semaphore is signaled and client wakes up
8. Lock is obtained on semaphore that was previously locked
9. Client now calls trywait on remaining semaphores which again succeeds.
10. Client sends update to server and blocks on pipe
11. Server checks all server-side objects, which are all signaled, so
calls satisfied on all objects
12. Server updates thread status and notifies client
13. Client wakes up and completes wait call.
As you can see this can get more complicated. If the server discovers
that an server object isn't signaled it will have to notify the client
to rollback the locks and wait for server objects to be ready.
So which of these solution is most appealing to you?
More information about the wine-devel
mailing list