[PATCH] msvcrt: SSE2 implementation of memcmp for x86_64.

Elaine Lefler elaineclefler at gmail.com
Fri Apr 1 23:44:37 CDT 2022


On Fri, Apr 1, 2022 at 7:13 AM Jan Sikorski <jsikorski at codeweavers.com> wrote:
>
> Signed-off-by: Jan Sikorski <jsikorski at codeweavers.com>
> ---
> It's about 13x faster on my machine than the byte version.
> memcmp performance is important to wined3d, where it's used to find
> pipelines in the cache, and the keys are pretty big.

Should be noted that SSE2 also exists on 32-bit processors, and in
this same file you can find usage of "sse2_supported", which would
enable you to use this code path on i386. You can put
__attribute__((target("sse2"))) on the declaration of sse2_memcmp to
allow GCC to emit SSE2 instructions even when the file's architecture
forbids it.

I think this could be even faster if you forced ptr1 to be aligned by
byte-comparing up to ((p1 + 15) & ~15) at the beginning. Can't
reasonably force-align both pointers, but aligning at least one should
give measurably better performance.

I have a similar patch (labelled 230501 on
https://source.winehq.org/patches/ - not sure how to link the whole
discussion, sorry) which triggered a discussion about duplication
between ntdll and msvcrt. memcmp is also a function that appears in
both dlls. Do you have any input on that? (sorry if I'm out of line
for butting in here. I just noticed we're working on the same basic
thing)

- Elaine



More information about the wine-devel mailing list