[PATCH] msvcrt: Improve memset performance on i386 and x86_64 architectures.

Sat Sep 11 09:41:37 CDT 2021

On 9/11/21 8:51 AM, Piotr Caban wrote:
> Signed-off-by: Piotr Caban <piotr at codeweavers.com>
> ---
>   dlls/msvcrt/string.c | 126 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 126 insertions(+)
> 
> 

FWIW as far as I can see on my simple throughput benchmarks, and with 
the default optimization flags (-O2), the unrolled C version:

* Outperforms the SSE2 assembly on x86_64 for n <= 32 (20GB/s vs 12GB/s 
for n = 32), and performs equally as good for "aligned" operations on 
larger sizes.

* It performs roughly at a third (25GB/s vs 70GB/s on my computer) on 
unaligned operations like memset(dst + 1, src, n) and n >= 256.

* On i686 it performs equally for small sizes (n <= 128) and then 
performs at half the throughput (35GB/s vs 70GB/s) for aligned 
operations and a third for unaligned ones.

It still has the advantage of being C code, benefiting all architectures.
-- 
Rémi Bernon <rbernon at codeweavers.com>