d3dx9: Avoid expensive computations

Mon Feb 25 05:26:58 CST 2013

On 25.02.2013 11:08, Henri Verbeet wrote:
> On 25 February 2013 10:24, Rico Schüller <kgbricola at web.de> wrote:
>> I did some small tests for speed with the following results. You may also
>> avoid such a lot of variable assignments like *pout = out and you may use 4
>> vecs instead. This should save ~48 assignments and it should also improve
>> the speed a bit more (~10%). Though, native is still 40% faster than that.
>>
> I'd somewhat expect native to use SSE versions of this kind of thing
> when the CPU supports those instructions. You also generally want to
> pay attention to the order in which you access memory, although
> perhaps it doesn't matter so much here because an entire matrix should
> be able to fit in a single cacheline, provided it's properly aligned.
>
Is there a reason why we don't use sse instructions? Or did just no one 
had a look at it yet?