D3D performance debugging report

Stefan Dösinger stefan at codeweavers.com
Wed Apr 27 12:27:17 CDT 2011


I spent a few hours debugging wined3d performance today. No, I found no magic 
fix for the slowness, just some semi-usable data.

First I wrote a hacky patch to avoid redundant FBO applications. This gave a 
tiny, tiny performance increase, see http://www.winehq.org/pipermail/wine-

The main investigation concerned redundant shader applications. The aim was to 
find out how many of our glBindProgramARB calls are re-binding the same 
program, and how much this costs. Depending on the game between 20% to 90% of 
all BindProgram calls are redundant. I'll attach my debug hack so others can 
test their own apps. I used ARB shaders for testing because they can apply 
vertex and fragment programs separately.

This brings up two questions:
(a) How much does this cost
(b) Why does this happen

The costs: In my draw overhead tester hacking out the redundant apply calls 
improved performance a lot, from about 101 fps to 157 fps. The biggest part of 
that are the GL calls. Without them but the remaining shader logic I get 144 

Unfortunately this does not translate to any performance gains in real apps. I 
tried to filter out the redundant apply calls in the simplest way possible: 
Track the current value per wined3d_context and check before calling 
glBindProgramARB. This gave the 144 fps in the draw overhead tester, but no 
measurable increase in any other apps(I tested StarCraft 2, HL2, Team Fortress 
2, World in Conflict and a few others)

Given the amount of redundant apply calls and the cost of them in the draw 
overlay tester I have expected at least some improvement. Certainly not a 50% 
performance increase(the draw overlay tester performs no shader changes at all 
in the draw loop), but at least a 2-3% gain. So far I have no explanation why 
I didn't see that.

But why do those redundant apply calls happen? It seems like the state 
dirtification comes all the way from the stream sources and/or vertex 
declaration. STREAMSRC is linked to VDECL, which is linked to VERTEXSHADER, 
which in turn reapplies the pixel shader. This means redundant vertex and 
pixel shader applications. Separating those states will be a major challenge.

The vdecl<->vshader link shouldn't be needed any more, except in rare cases 
where GL_ARB_vertex_array_bgra is not supported and the application switches 
one attribute from D3DDECL_D3DCOLOR to a non-d3dcolor attribute. If the vertex 
shader changes we still have to reparse the vertex declaraion and reapply the 
stream sources because the vshader determines the stream numbers. Maybe we can 
reduce the number of times this happens by ordering stream usages and indices 
to make sure shaders with compatible input get the same stream ordering.

vdecl and streamsrc are pretty related. If the vdecl is changed we have to 
reapply the stream sources. The other way around shouldn't cause problems 
though. There's no need to reapply every stream except the changed ones and 
there's no need to reapply the vertex shader.

The vertex and pixel shader are linked for a few reasons: The shader backend 
API offers only a function to set both. Basic GLSL only offers a function to set 
both at once(GL_ARB_separate_shader_objects changes that). And even in ARB the 
pixel shader input may require some changes in the vertex shader output to get 
Shader Model 3.0 varyings right.

The shader backend API can be changed, but it has to be done in a way that 
doesn't hurt GLSL without ARB_separate_shader_objects. If we have classic GLSL 
we have to keep the link. With ARB we can conditionally reapply the vertex 
shader if the ps_input_signature is changed.

To complicate matters there are additional states that affect the shaders, like 
fog, textures, clipping. We don't keep track of those dependencies.

So it's a lot of work to clean up these state dependencies and we don't know 
how much it'll gain us :-(

-------------- next part --------------
A non-text attachment was scrubbed...
Name: shaderdebug.diff
Type: text/x-patch
Size: 8099 bytes
Desc: not available
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20110427/1c49c883/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20110427/1c49c883/attachment.pgp>

More information about the wine-devel mailing list