[1/3] WineD3D: Implement a different constant dirtification algorithm

Stefan Dösinger stefan at codeweavers.com
Mon Mar 3 19:30:23 CST 2008


The additional device vtable this patch adds has raised a few concerns, so 
I'll explain my considerations:

Using a different implementation of Set*ShaderConstantF is purely a 
performance consideration. Obviously a if() statement or a shader model 
callback would give the same separation of the dirtifications. However, the 
main problem with constant setting isn't that setting a constant is 
expensive, it is because the whole code is called so often by the d3d app. 
With the vtable we have an indirection layer already which we can't get rid 
of anyway, but we can use it as good as possible.

Which codepath is chosen is determined at device creation time, so using the 
vtable to select one of the function makes this an O(1), rather than an O(n). 
Determining this at call time will not only reduce the use of this patchset, 
but also make the already rather burdened GLSL constant loading even slower 
for currently no gain in GLSL.

Henri and Chris had a few suggestions. One of them was that the current GLSL 
list could be used like my array as well, but the problem is that operating 
on the list is more expensive(not algorithmically, but still in terms of CPU 
cycles) than the operations on the array. The list could be changed to an 
array as the current GLSL code could live with that, but I think that the 
GLSL code actually profits from the list, and we can optimize loading groups 
of dirty constants using the list. The main difference is that in ARB the 
dirty constant array is as often written as it is read, and in GLSL the array 
is pretty much read only. After the first frame it should remain mostly 
untouched. So in GLSL we can implement some read optimizations when writing 
the list, while in ARB we should make writes as cheap as possible.

Ken and I have done some performance testing of this patchset, and as with 
every 3D app the result depends on various factors. In GPU limited apps there 
is no gain. With the Linux Nvidia driver we gain a small improvement in CPU 
limited cases. In my HL2 timedemo I got 94 instead of 90 fps. On the Mac, 
where shader loading is much more expensive, the improvements are bigger. My 
HL2 timedemo went from 75 to 82 fps, which is almost a 10% improvement, and 
Ken reported an improvement from 90 to 105 fps. We tested some other games as 
well, but not all have useful benchmarking support, and the overall 
improvements were around 5-10 %.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-WineD3D-Implement-a-different-constant-dirtificatio.patch.bz2
Type: application/x-bzip2
Size: 5870 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-patches/attachments/20080304/841771de/attachment-0001.bin 


More information about the wine-patches mailing list