d3dx9: Implement D3DXSHMultilpy5

Nozomi Kodama nozomi.kodama at yahoo.com
Sat Apr 20 13:49:43 CDT 2013


These bunch of define are pretty ugly anyway. I prefer a pretty duplicated code rather than these tons of defines.
Define should be avoided as much as possible.



________________________________

On 15.04.2013 02:10, Nozomi Kodama wrote:
> Hello
> 
> thanks for the review.
> I don't think that calling defines is the way to go. Indeed, I tested my patch and yours. Yours is about 12% slower than mine in my computer.
> And now, we try to take care of efficiency of this dll. So, it is not the time  to increase latency.
> 
You are right. I'm not really sure why the code should be slower though. The #defines shouldn't have an impact on the performance, but it might be because it is translated to:

ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8];
tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8];
out[1] = 0.0f + ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] = out[0] + 0.28209479f * t;
out[6] = 0.0f + -0.12615662f * t;
out[8] = 0.0f + -0.21850968f * t;

instead of:
ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8];
out[1] = ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] += 0.28209479f * t;
out[6] = -0.12615662f * t;
out[8] = -0.21850968f * t;

May be due to "out[8] = -0.21850968f * t;" vs "out[8] = 0.0f + -0.21850968f * t;":
1. the extra 0.0f - Shouldn't the 0.0f get optimized away? Imho the macro could be fixed (e.g. use an if / separate macro / don't use a macro for this cases).
2. "+ -0.21850968f * t;" should be as fast as "-0.21850968f * t" isn't it?

Does anyone else have an objection about this? I just feel that the code size and the always reused constants could be written a little bit nicer. I'm not sure if the macro usage is really better, it was just a quick idea to avoid that much code duplication. Also changes like the suggested below are really error prone, because they change a lot of code mostly using copy and paste. Though I vote for the same style and precision usage in all cases.

Hope this helps a bit
Rico

> 
> I used 10 digits since there are a lot of computation, I want to avoid as much as possible big rounding errors. If we want to uniformize, then we should uniformize d3dxshmultiply 2,3,4 with 10 digits.
> But that is for an another patch.
> 
> Nozomi.
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20130420/76c2b6e2/attachment-0001.html>


More information about the wine-devel mailing list