WineD3D state management
Stefan Dösinger
stefandoesinger at gmx.at
Mon Nov 27 04:20:11 CST 2006
Hi,
In the past days I've been hacking on implementing my state management ideas,
and I think I've come to a state where I don't have to be completely ashamed
of my patches :-)
First, what the code does NOT do yet:
* Pixel Shaders, GLSL shaders: I only had my notebook with the M9 available,
so I had no chance to implement them. Expect anything from broken graphics to
the sudden release of Duke Nukem Forever if you try to use them.
* Stateblocks
* Register combiners: Disabled right now
* Offscreen rendering: Causes random rendering garbage
* 2D Blits: Commented out
I have described the basic ideas in earlier
mails(http://www.winehq.org/pipermail/wine-devel/2006-October/051868.html),
so I don't describe them here again. I pretty much followed the original
plan.
Performance:
One of the aims was to get better performance, since we apparently lost
performance due to exessive state changes which eat CPU time and may require
CPU-GPU syncs. My patches improve performance, but not as much as I
originally hoped. I mainly have performance figures on the M9, and some basic
testing on a gf7600.
* Billboard dx8 sdk demo: got from 56fps to 107 fps :-)
* Half-Life 1: Quite an improvement here too. 110->150 fps in one of my
timedemos. The d3d renderer now outperforms the opengl renderer(140 fps).
Both the billboard demo and hl1 hit a special rendering case(no stream source
or fvf changes), this is nicely optimized by my changes. The gl renderer in
hl1 uses immediate mode drawing while wined3d can use VBOs and array drawing,
thus beeing faster on today's cards.
* Battlefield 1942: Slight improvement too, 32->37 fps on my testing
scene(spawn point on a u.s. carrier at full graphics). BF1942 exceeded the usual
linux/windows driver performance ratio already before, so I assume I'm pretty
much at the limit of my M9 here.
* 3DMark2000: Unfortunately my driver crashes it before showing the scores, so
I can only watch the in-test counter. Seems to get +5 to +10 fps in the low
detail helicopter test(resolution independent). Native msvcrt.dll gets
another +5 fps.
I did only a short testing on my geforce7600:
* 3dmark2000: gets 11500 3dmarks, with forcing drawStridedFast 14500. This is
I believe the windows performance. However, the benchmark is too old to be
meaningful. Before my state patches drawStridedFast scroe was around 13500 if
I remember correctly, have to retest.
* 3dmark2001: Low detail tests run at 150-300 fps, too fast for a meaningful
result. high detail tests are slow and partially broken due to offscreen
rendering.
* Battlefield 1942: Runs at steady 100fps, but it did that already before
So it seems that the state patches improve one bottleneck, but we have still
many others(offscreen rendering, drawStridedSlow) left. The nvidia profiling
driver may help here.
Where to go from here:
The state management was also planned to make implementing other features
easier:
* Multithreading: Make the dirty states list per context, and the helpers
stored in the device too. Before applying the states activate the correct ctx
for the thread.
* Stateblocks: Basic idea is to record a display list and call it:
glNewList(stateblock->listname, GL_COMPILE);
for(i = 1; i <= STATE_HIGHEST; i++) {
States[i].func(i, stateblock);
}
glEndList();
To apply the stateblock: glCallList(stateblock->listname);
Ok, we need to split the list to apply only partial states, and the for loop
can be improved to create a more efficient list. When the stateblock is
altered we have to recreate the list. Thats the basic idea...
* Offscreen rendering: Depends on wether we need seperate contexts for
pbuffers. If yes, include it with the multithreading ctx finding, then apply
the states, otherwise I think we can make selecting the pbuffer/fbo a state
like all others. Has interactions with the viewport(I think) and the
projection matrix(render_offscreen for upside down rendering)
* sRGB textures: Dirtifies the sampler. All textures have now information
about how many samplers they are bound to, and the number of one of the
samplers. Phil?
* Vertex samplers: Ivan said he'd need the state management for them. My idea
is to build a d3d sampler - gl sampler mapping in SetTexture, which will be
needed for register combiners too. Based on that we can bind vtf samplers in
gl.
I have no clean patches right now(45 chaotic patches), so I decided to share
my wined3d directory. However, this is even compressed a bit big for a
mailing list, so I uploaded it to
http://stud4.tuwien.ac.at/~e0526822/wined3d-statemgmt.tar.bz2
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20061127/1b874b23/attachment-0001.pgp
More information about the wine-devel
mailing list