[1/3] WineD3D: Implement a different constant dirtification algorithm
stefan at codeweavers.com
Mon Mar 3 19:30:23 CST 2008
The additional device vtable this patch adds has raised a few concerns, so
I'll explain my considerations:
Using a different implementation of Set*ShaderConstantF is purely a
performance consideration. Obviously a if() statement or a shader model
callback would give the same separation of the dirtifications. However, the
main problem with constant setting isn't that setting a constant is
expensive, it is because the whole code is called so often by the d3d app.
With the vtable we have an indirection layer already which we can't get rid
of anyway, but we can use it as good as possible.
Which codepath is chosen is determined at device creation time, so using the
vtable to select one of the function makes this an O(1), rather than an O(n).
Determining this at call time will not only reduce the use of this patchset,
but also make the already rather burdened GLSL constant loading even slower
for currently no gain in GLSL.
Henri and Chris had a few suggestions. One of them was that the current GLSL
list could be used like my array as well, but the problem is that operating
on the list is more expensive(not algorithmically, but still in terms of CPU
cycles) than the operations on the array. The list could be changed to an
array as the current GLSL code could live with that, but I think that the
GLSL code actually profits from the list, and we can optimize loading groups
of dirty constants using the list. The main difference is that in ARB the
dirty constant array is as often written as it is read, and in GLSL the array
is pretty much read only. After the first frame it should remain mostly
untouched. So in GLSL we can implement some read optimizations when writing
the list, while in ARB we should make writes as cheap as possible.
Ken and I have done some performance testing of this patchset, and as with
every 3D app the result depends on various factors. In GPU limited apps there
is no gain. With the Linux Nvidia driver we gain a small improvement in CPU
limited cases. In my HL2 timedemo I got 94 instead of 90 fps. On the Mac,
where shader loading is much more expensive, the improvements are bigger. My
HL2 timedemo went from 75 to 82 fps, which is almost a 10% improvement, and
Ken reported an improvement from 90 to 105 fps. We tested some other games as
well, but not all have useful benchmarking support, and the overall
improvements were around 5-10 %.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 5870 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-patches/attachments/20080304/841771de/attachment-0001.bin
More information about the wine-patches