[PATCH] msvcrt: Import memmove from musl
Gabriel Ivăncescu
gabrielopcode at gmail.com
Wed Aug 26 10:19:40 CDT 2020
On 26/08/2020 17:01, Gabriel Ivăncescu wrote:
> On 25/08/2020 20:15, Piotr Caban wrote:
>> On 8/22/20 5:10 PM, Gabriel Ivăncescu wrote:
>>> I understand `rep movsl` is faster even in the first test than `rep
>>> movsb`?
>> No, it was faster in "Non-aligned", "Aligned overlap" and "Non-aligned
>> overlap" tests. In the "Aligned" case the performance was identical no
>> matter if movsb or movsl was used.
>>
>> I'm also attaching simple sse2 implementation for comparison. It's
>> faster than the previous one on my machine. I'm also attaching results
>> from running the test on Windows (in VM).
>>
>> Thanks,
>> Piotr
>
> In most cases, the SSE version performs very well, in fact slightly
> better than the Windows implementation, and does very well for small moves.
>
> Unfortunately, for some reason, it seems it's quite significantly slower
> (20% or more) only on the "non-overlapped" case. Attached results.
>
> Thanks,
> Gabriel
Also, sorry I forgot to mention a small thing, is there a reason you're
using movdq(a|u) instead of movaps/movups (which are also SSE1 not
SSE2)? They have smaller encoding and should very slightly help with the
instruction cache, and no CPU cares about floating vs int states when
doing only moves. (even if it did, most operations on SSE tend to be for
floats anyway, assuming some broken CPU has some false dependency on
them, but I doubt it)
More information about the wine-devel
mailing list