Speeding up wineserver syncronization objects with shared memory
dhowells at cambridge.redhat.com
Wed Mar 7 03:34:40 CST 2001
> Note that we are no longer doing that in the latest versions; the file
> descriptor is only transferred once,
Fair enough... I see that ZwClose/NtClose isn't actually a problem (since
unlike most other Zw* calls, it can't affect other processes).
Oh... I see how you're doing it... sending the handle->fd translate request to
the server, which sends a response saying you've got it cached; then using
dup() locally to emulate the old behaviour; and then closing the fd.
So this saves you the cost of the fd transfer net packet. Though you still
have to do the two context switches, which is my main contention.
> and all further requests are done on a pipe which is faster than a socket.
True, but I'd have thought that the context switches involved are still a cost
you can't get rid of so easily. Out of interest, how do you plan on doing the
locking stuff for Read/WriteFile? Cache it locally? It is unfortunate, but you
can't really make use of UNIX file locking, since this is mostly advisory and
as such doesn't actively stop read/write calls.
> The kernel module itself may be hard to do incrementally, but you
> should really consider reusing the existing server API so that your
> module can be plugged in easily. For instance your module entry points
> should be the same as the server requests, and use the same request
What? Miss the opportunity to implement "int 0x2e" directly? *grin*
Seriously, though, whilst this'd be a lot easier in many ways (and it would
allow you to avoid the context-switch penalties), you wouldn't be able to take
full advantage of the available support in the kernel, which is more capable
than the standard UNIX userspace API suggests.
It'd still have to paste handles to fds for most file operation calls, and
you'd still have the PE Images soaking up a fair amount of memory.
If this is what you want, then it might be better done as a network protocol
module that just pretends to be a wineserver, and supports the same
read/write/sendmsg/recvmsg interface. (It'd have to be a network protocol to
be able to get sendmsg/recvmsg calls):
int serv = socket(AF_WINE,SOCK_STREAM,0);
short addr = AF_WINE;
> I still think that it should be possible to improve that by a small
> kernel hack. It will never be as fast as doing everything in the
> kernel of course, but it may just be fast enough to avoid the need to
> reimplement the whole server.
If you want to suggest exactly what you'd like to see as a hack...
> Have you measured how many dirty pages you can avoid with your change?
> It seems to me that in most cases, when the dll is loaded at its
> preferred address, the number of pages made dirty by the fixups should
> be quite small anyway.
As far as I've observed (I've got Win2000 available), most Windows DLL's have
512-byte (sector) alignment internally, _not_ 4096-byte (page) alignment for
the sections. This means that the separate sections can't be mmap'd (or else
they'd lose their required relative relationships):
/* mmap() failed; if this is because the file offset is not */
/* page-aligned (EINVAL), or because the underlying filesystem */
/* does not support mmap() (ENOEXEC,ENODEV), we do it by hand. */
This appears to happen a lot. And then _all_ the pages in that section are
dirty, irrespective of whether fixups are done or not.
Also, since DLLs and EXEs are not compiled as PIC (the MSDEV compiler not
having such an option as far as I can recall), the fixup tables usually seem
to apply to just about every page in the code section.
I'll have to write a small program to collect some statistics:-)
As for the DLL being loaded at it's preferred address, the kernel module jumps
around the fixup stuff, and doesn't even consider trying to perform it.
Plus pages that have been altered by the fixup code are actually marked
_clean_ by the VM subsystem, and can thus be simply discarded when physical
memory needs to be reclaimed.
More information about the wine-devel