D3D performance debugging report

Stefan Dösinger stefandoesinger at gmx.at
Sat Apr 30 10:18:54 CDT 2011

Here's another update.

First I expanded my performance tests at 
a bit. The old tests were renamned to streamsrc_d3d and streamsrc_gl, and I 
added another set of tests that just tests the draw overhead without ever 
changing any states: drawprim_d3d and drawprim_gl. Here are the performance 
results from Windows 7:

drawprim_gl:	~1154 fps
drawprim_d3d:	~1160 fps

In Wine the D3D version gets 165.67fps fps. The Linux native GL version gets 
1791 fps. The GL windows version in Wine gets about 600 fps(FIXME!). Don't 
worry too much about the GL performance, this is mostly locking overhead. More 
about that later.

I ran my usual d3d performance hacks through the d3d version. The hacks are 
pretty much the same as with the stremsrc test, except that I don't need the 
redundant vertex shader apply hacks. I attached a tarball with the hacks and a 
file listing their performance impact.

The plan forward is still the same: Write more of those tests(especially tests 
that test non-draw stuff like resource loads), improve the tests and hope that 
real apps profit.

The optimistic scenario is that this works out. So far we've seen slow 
movementin real apps with the two fixes we've made(context_validate and FBO 
application, the latter isn't in Wine yet). This is expected to a certain 
extend, because the performance is reversely proportional to the number of 
performance bugs we have. So we'll have to remove a lot of them before we see 
big movement.

The pessimistic scenario is that those tests have nothing in common with the 
performance bugs in real apps and the fixes only end up making the code more 

To that end I think I'll create a github repo where I try to get the hacks 
into a somewhat usable state - not commitable to wine, but good enough that 
they don't break apps, so they can be tested against real world apps. That way 
we can find out how much they really improve real games without clogging our 
codebase without certainty that the changes help.

Here are again some descriptions of the hacks I tested:

2) End-user business, fairly harmless. Should always be used if performance is 

3, 4) Will break stuff. Can be fixed, but would be rather ugly. Probably 
interesting once we run out of easier fixes

5) Could go into Wine sooner or later. Does improve real games on its own 

6) Easy to clean up, I'll send a patch today. we can skip validation if FIXMEs 
are off since nobody will see them.

7) I tried to find out if removing one call level helps, but it doesn't even 
improve this locking overhead sensitive test app. Forget about it

8) Doable, but pretty uninteresting. I doubt we'll get a noticeable 
improvement in a real app

9-11) Distributor / End use choice. Note that some compiler flags(especially 
the framepointer one) can break apps and copy protection systems.

12) Distributor / End user choice too, but harmless. Not much gain compared to 
WINEDEBUG=-all though

13) Doesn't improve performance a whole lot once debug msgs are compiled out.

14) We should be able to limit calls to this functions to cases where the 
textures were changed or vertex texture fetch is used. We may be able to 
eliminate it entirely when we have enough samplers available

15, 16) I caution against too much optimism here. We won't be able to get rid 
of the locking anytime soon. Maybe the EnterCriticalSection / 
LeaveCriticalSection performance can be improved. A part of the problem is 
call overhead, but I think the biggest issue are the locked increment and 
decrement operations in RtlEnterCriticalSection / RtlLeaveCriticalSection.
Orig performance: 178 fps
Interlocked ops replaced with normal inc/dec: 244 fps
Lock calls removed from wined3d: 293 fps
(this is just to give you some idea where the time is spent)

17) Forget about this one until we run out of other optimizations

18) It's interesting how much this gives without all the other optimizations. 
My app doesn't use any textures, so this is just the call overhead and loping 
over the fragment samplers.

19) My app renders to a too small window, so swapchain render_to_fbo triggers. 
It's interesting that getting rid of it makes performance worse

21) Removing that and other checks in drawPrimitive() barely speeds up the 
test. I got a total of 7-8 fps out of the compatibility or error checks in 
drawPrimitive, this won't show up in any real app.

-------------- next part --------------
 0) Original:                           165.67fps
 1) Hack out device_preload_textures    187.08fps       (see 18, reverted. More than expected)
 2) WINEDEBUG=-all                      178.43fps
 3) rt location housekeeping            186.29fps
 4) ds location housekeeping            200.93fps
 5) redundant fbo checks                254.99fps
 6) FBO validation                      293.61fps
 7) mutex call level                    294.08fps       (essentially unchanged, reverted)
 8) SetPrimitiveType checking           297.80fps
 9) CFLAGS="-g -O2"                     297.80fps       (supposed to change nothing)
10) CFLAGS="-g -O3"                     308.05fps
11) CFLAGS="-g -O3 -fomit-fp"           337.11fps       (yep, I know it is -fomit-frame-pointer)
12) compile out debug msgs              341.77fps       (huh, once this ran slower)
13) no -fPIC                            344.21fps       (ok, not that much change here)
14) findTexUnitMap hack                 390.15fps
15) no wined3d mutex                    449.40fps       (unrealistic)
16) no gl locks                         848.77fps       (Santa Claus is real)

---- Notice ----
From here on(and probably an earlier point) the individual items cause way too high
performance improvements. Keep the inverse proportionality in mind

17) context level counting              887.41fps
18) hack out device_preload_textures    973.35fps       (let's see how much that does without the other hacks)
19) turn render to fbo off              919.39fps       (interesting, reverted)
20) Turn off glsl                       973.35fps       (note: the shader and constants are static, reverted)
21) remove point sprite warning         976.68fps       (margin of errror)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hacks.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 10740 bytes
Desc: not available
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20110430/562083de/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20110430/562083de/attachment.pgp>

More information about the wine-devel mailing list