[PATCH 4/4] msvcrt: Add an SSE2 memset_aligned_32 implementation.

Piotr Caban piotr.caban at gmail.com
Mon Sep 13 09:50:22 CDT 2021


Hi Rémi,

I think you're undervaluing the SSE2 codepath. While erms was introduced 
on Intel CPU's quite long ago it's a fairly new thing on AMD CPU's (as 
far as I understand the first AMD CPU to set the cpuid flag was released 
in mid 2019).

On 9/13/21 2:23 PM, Rémi Bernon wrote:

> +#ifdef __i386__
> +    if (n < 2048 && sse2_supported)
if ((n < 2048 && sse2_supported) || !erms_supported)
> +#else
> +    if (n < 2048)
if (n < 2048 || !erms_supported)
> +#endif
> +    {
> +        __asm__ __volatile__ (
> +            "movd %1, %%xmm0\n\t"
> +            "pshufd $0, %%xmm0, %%xmm0\n\t"
> +            "test $0x20, %2\n\t"
> +            "je 1f\n\t"
> +            "sub $0x20, %2\n\t"
> +            "movdqa %%xmm0, 0x00(%0,%2)\n\t"
> +            "movdqa %%xmm0, 0x10(%0,%2)\n\t"
> +            "je 2f\n\t"
> +            "1:\n\t"
> +            "sub $0x40, %2\n\t"
> +            "movdqa %%xmm0, 0x00(%0,%2)\n\t"
> +            "movdqa %%xmm0, 0x10(%0,%2)\n\t"
> +            "movdqa %%xmm0, 0x20(%0,%2)\n\t"
> +            "movdqa %%xmm0, 0x30(%0,%2)\n\t"
> +            "ja 1b\n\t"
> +            "2:\n\t"
> +            :
> +            : "r"(d), "r"((uint32_t)v), "c"(n)
> +            : "memory"
> +        );
Shouldn't xmm0 be added to clobbered registers list?

Thanks,
Piotr



More information about the wine-devel mailing list