[PATCH 01/10 v2] server: Introduced iosb struct for server-side IO_STATUS_BLOCK representation and use it in irp_call.

Mon Oct 24 08:46:40 CDT 2016

On 20.10.2016 19:17, Sebastian Lackner wrote:
> On 19.10.2016 19:05, Jacek Caban wrote:
>> v2: Don't store async queues in a list.
>>
>> Signed-off-by: Jacek Caban <jacek at codeweavers.com>
>> ---
>>  server/async.c  | 34 +++++++++++++++++++++++++++++++++
>>  server/device.c | 58
>> +++++++++++++++++++++++++--------------------------------
>>  server/file.h   | 14 ++++++++++++++
>>  3 files changed, 73 insertions(+), 33 deletions(-)
>>
> I'm not sure if Alexandre has already agreed to this approach, but from my
> point of view it would make sense to do some measurements. How fast is the
> proposed named pipe approach compared to alternatives (unpatched Wine, or
> the Linux-only implementation included in Wine Staging)?

As Alexandre said, I think we should concentrate on correctness first,
but I did some benchmarking that should give us overview. Proper
measuring depends on many things and different implementations have
various weaknesses. Here are a few things to keep in mind:

- My patchset leaves room for improvements as explained in [1]. I have
prototyped patches for those optimizations and tested both with and
without them.

- Calls with immediate results have very different characteristics than
ones that return immediately. If we need to wait, then server round
trips either using overlapped calls or blocking select require server
round trips, for which extra data that needs to be transferred with my
patchset (if not too big), shouldn't cause significant difference. When
using my tests with large buffer, most calls should have immediate
results, so I expect it to tests worst-case scenario.

- Current Wine implementation uses blocking socket calls for
non-overlapped pipes. This significantly affects results, so I tested
both in overlapped and non-overlapped mode (although calls themselves
are not overlapped).

- Small buffer sizes would be interesting to test, because they cause
mode blocking and thus would show differences in such case. Sadly,
staging implementation doesn't take that into account, so results would
be meaningless.

I used the attached test with parameters:
npbench.exe 1024 10000 100000 (for non-overlapped)
npbench.exe 1024 10000 100000 1 (for overlapped)

Results are also attached. Proper benchmarking would require a lot more
care, but I think it shows what you asked for. This implementation is
3.3 times slower in overlapped case, 7.4 times slower in non-overlapped
tests (3.1 and 2.5 times compared to Windows in VM). I'd say that's not
too bad. Overlapped case is even faster than I expected.

BTW, did you expect staging patches to be so much faster in overlapped
case than plain Wine? I didn't look too deeply in the code, maybe there
is some optimization, but that's a bit suspicious.

> In general I like the idea, but I fear that adding significant optimizations
> afterwards will be very difficult.

FWIW, it should be fairly easy to rebase staging patches on top of my
series. It leaves the whole previous implementation mostly untouched for
byte mode case. That, however, may not be true after further changes in
the future, esp. if we decided to use server I/O implementation for byte
mode case as well.

Thanks,
Jacek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: npbench.c
Type: text/x-csrc
Size: 2154 bytes
Desc: not available
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20161024/34d0787b/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: results.csv
Type: text/csv
Size: 181 bytes
Desc: not available
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20161024/34d0787b/attachment.csv>