d3dx9: Implement D3DXSHMultilpy5

Rico Schüller kgbricola at web.de
Mon Apr 15 03:24:28 CDT 2013


On 15.04.2013 02:10, Nozomi Kodama wrote:
> Hello
>
> thanks for the review.
> I don't think that calling defines is the way to go. Indeed, I tested my patch and yours. Yours is about 12% slower than mine in my computer.
> And now, we try to take care of efficiency of this dll. So, it is not the time  to increase latency.
>
You are right. I'm not really sure why the code should be slower though. 
The #defines shouldn't have an impact on the performance, but it might 
be because it is translated to:

ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8];
tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8];
out[1] = 0.0f + ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] = out[0] + 0.28209479f * t;
out[6] = 0.0f + -0.12615662f * t;
out[8] = 0.0f + -0.21850968f * t;

instead of:
ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8];
out[1] = ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] += 0.28209479f * t;
out[6] = -0.12615662f * t;
out[8] = -0.21850968f * t;

May be due to "out[8] = -0.21850968f * t;" vs "out[8] = 0.0f + 
-0.21850968f * t;":
1. the extra 0.0f - Shouldn't the 0.0f get optimized away? Imho the 
macro could be fixed (e.g. use an if / separate macro / don't use a 
macro for this cases).
2. "+ -0.21850968f * t;" should be as fast as "-0.21850968f * t" isn't it?

Does anyone else have an objection about this? I just feel that the code 
size and the always reused constants could be written a little bit 
nicer. I'm not sure if the macro usage is really better, it was just a 
quick idea to avoid that much code duplication. Also changes like the 
suggested below are really error prone, because they change a lot of 
code mostly using copy and paste. Though I vote for the same style and 
precision usage in all cases.

Hope this helps a bit
Rico

>
> I used 10 digits since there are a lot of computation, I want to avoid as much as possible big rounding errors. If we want to uniformize, then we should uniformize d3dxshmultiply 2,3,4 with 10 digits.
> But that is for an another patch.
>
> Nozomi.
>
>




More information about the wine-devel mailing list