wined3d performance patches

Stefan Dösinger stefandoesinger at
Mon Jun 6 10:55:43 CDT 2011

This is intended mostly for the other d3d developers, but since we have quite 
a number of them now so individual CCs are a lot of work :-)

I attached the patches I currently have in my tree to give an update on what 
I've been working on recently. The main aim of those patches is to reduce draw 
overhead a bit, thus improving game performance. The patches need some 
cleanup, but for that I first need a patch Matteo is working on.

Feedback is welcome. I'm also interested in test results, e.g. if the changes 
break a game, or the performance impact. If those patches cause a 5% 
performance increase I am happy.

Patches 1-3: Mostly unrelated. I haven't sent them yet because patch 3 breaks 
Unigine Heaven, and patches 1 and 2 make little sense without 3.

Patch 4: This removes a hack for a driver bug workaround. I have to do more 
testing on my old machines to find out if the bug is really fixed in newer 
nvidia drivers.

Patches 5, 6: They keep track of changes to the framebuffer setup so we don't 
have to run through the code that figures out which FBO to bind every draw. 
Patch 5 gets rid of the ordering assumption. Patch 6 applies the FBO only when 

They aren't ready yet. In patch 6 the FBO may have to be reapplied when the 
pixelshader changes. To implement that I need some draw buffer tracking 
infrastructure Matteo is working on. Also clears can be integrated. fbo-
clear.diff is a half-baked attempt to do this. I dropped it when I realized I 
was duplicating Matteos work. After that I have to double-check that I took 
care of all situations where the FBO may have to be updated.

Furthermore, Matteo says that not calling context_apply_draw_buffers every 
time framebuffer() is run is a noticeable performance improvement too. Matteo, 
did you test this with just patch 0005, or both 0005 and 0006?

Patch 0007: Sampler map optimization, it has a lengthy description in the 
patch file

Patch 0008: A tiny fix, it results in a pretty small improvement on OSX. On 
Linux+Nvidia it is not noticeable.

Patch 0009: At first I tried to skip the render target dirtification entirely 
via a flag in the d3ddevice, but it was pretty tricky and ugly. Just making it 
cheaper gets us ~2/3rd of the way too. (Draw overhead tester performance 
without this: 259 fps. Complete disabling of the dirtification calls via a 
hack: 275. With this patch: 269)

0010: An unrelated cleanup

Patches 11, 12: Preparation for including clears in the fbo dirtification 
patches. See fbo-clear.diff.

More work on performance is obviously required, for example

*) Separate vertex declaration, vertex shader and pixel shader states
*) Speed up sampler preloading. This will be easier once we have a tree-like 
state structure.
*) Write more tests for other common operations, like clears, blits, shader 
changes, texture changes, vertex buffer changes, dynamic resource loading
*) Test our shader's GPU-side execution performance
*) See if we can do something about locking
*) Isolate bottlenecks in the GPU drivers and get them fixed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patches.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 13345 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <>

More information about the wine-devel mailing list