WineD3D state management

Stefan Dösinger stefandoesinger at gmx.at
Mon Nov 27 04:20:11 CST 2006


Hi,
In the past days I've been hacking on implementing my state management ideas, 
and I think I've come to a state where I don't have to be completely ashamed 
of my patches :-)

First, what the code does NOT do yet:
* Pixel Shaders, GLSL shaders: I only had my notebook with the M9 available, 
so I had no chance to implement them. Expect anything from broken graphics to 
the sudden release of Duke Nukem Forever if you try to use them.

* Stateblocks
* Register combiners: Disabled right now
* Offscreen rendering: Causes random rendering garbage
* 2D Blits: Commented out

I have described the basic ideas in earlier 
mails(http://www.winehq.org/pipermail/wine-devel/2006-October/051868.html), 
so I don't describe them here again. I pretty much followed the original 
plan.

Performance:
One of the aims was to get better performance, since we apparently lost 
performance due to exessive state changes which eat CPU time and may require 
CPU-GPU syncs. My patches improve performance, but not as much as I 
originally hoped. I mainly have performance figures on the M9, and some basic 
testing on a gf7600.

* Billboard dx8 sdk demo: got from 56fps to 107 fps :-)
* Half-Life 1: Quite an improvement here too. 110->150 fps in one of my 
timedemos. The d3d renderer now outperforms the opengl renderer(140 fps). 
Both the billboard demo and hl1 hit a special rendering case(no stream source 
or fvf changes), this is nicely optimized by my changes. The gl renderer in 
hl1 uses immediate mode drawing while wined3d can use VBOs and array drawing, 
thus beeing faster on today's cards.
* Battlefield 1942: Slight improvement too, 32->37 fps on my testing 
scene(spawn point on a u.s. carrier at full graphics). BF1942 exceeded the usual 
linux/windows driver performance ratio already before, so I assume I'm pretty 
much at the limit of my M9 here.
* 3DMark2000: Unfortunately my driver crashes it before showing the scores, so 
I can only watch the in-test counter. Seems to get +5 to +10 fps in the low 
detail helicopter test(resolution independent). Native msvcrt.dll gets 
another +5 fps.

I did only a short testing on my geforce7600:
* 3dmark2000: gets 11500 3dmarks, with forcing drawStridedFast 14500. This is 
I believe the windows performance. However, the benchmark is too old to be 
meaningful. Before my state patches drawStridedFast scroe was around 13500 if 
I remember correctly, have to retest.
* 3dmark2001: Low detail tests run at 150-300 fps, too fast for a meaningful 
result. high detail tests are slow and partially broken due to offscreen 
rendering.
* Battlefield 1942: Runs at steady 100fps, but it did that already before

So it seems that the state patches improve one bottleneck, but we have still 
many others(offscreen rendering, drawStridedSlow) left. The nvidia profiling 
driver may help here.


Where to go from here:
The state management was also planned to make implementing other features 
easier:

* Multithreading: Make the dirty states list per context, and the helpers 
stored in the device too. Before applying the states activate the correct ctx 
for the thread.

* Stateblocks: Basic idea is to record a display list and call it:
glNewList(stateblock->listname, GL_COMPILE);
for(i = 1; i <= STATE_HIGHEST; i++) {
    States[i].func(i, stateblock);
}
glEndList();

To apply the stateblock: glCallList(stateblock->listname);

Ok, we need to split the list to apply only partial states, and the for loop 
can be improved to create a more efficient list. When the stateblock is 
altered we have to recreate the list. Thats the basic idea...

* Offscreen rendering: Depends on wether we need seperate contexts for 
pbuffers. If yes, include it with the multithreading ctx finding, then apply 
the states, otherwise I think we can make selecting the pbuffer/fbo a state 
like all others. Has interactions with the viewport(I think) and the 
projection matrix(render_offscreen for upside down rendering)

* sRGB textures: Dirtifies the sampler. All textures have now information 
about how many samplers they are bound to, and the number of one of the 
samplers. Phil?

* Vertex samplers: Ivan said he'd need the state management for them. My idea 
is to build a d3d sampler - gl sampler mapping in SetTexture, which will be 
needed for register combiners too. Based on that we can bind vtf samplers in 
gl.

I have no clean patches right now(45 chaotic patches), so I decided to share 
my wined3d directory. However, this is even compressed a bit big for a 
mailing list, so I uploaded it to 
http://stud4.tuwien.ac.at/~e0526822/wined3d-statemgmt.tar.bz2

Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20061127/1b874b23/attachment-0001.pgp


More information about the wine-devel mailing list