[PATCH] msvcrt: Faster memcmp().
Piotr Caban
piotr.caban at gmail.com
Wed Apr 13 08:50:20 CDT 2022
Hi Jan,
On 4/6/22 19:14, Jan Sikorski wrote:
> -/*********************************************************************
> - * memcmp (MSVCRT.@)
> - */
> -int __cdecl memcmp(const void *ptr1, const void *ptr2, size_t n)
> +static inline int memcmp_unaligned(const void *ptr1, const void *ptr2, size_t n)
> {
> const unsigned char *p1, *p2;
>
> @@ -2690,6 +2687,64 @@ int __cdecl memcmp(const void *ptr1, const void *ptr2, size_t n)
> return 0;
> }
I think it would be good to optimize memcmp_unaligned a little. I'm
thinking about something along these lines (untested):
static int memcmp_size_t(size_t s1, size_t s2)
{
const uint8_t *p1 = (const uint8_t*)&s1, *p2 = (const uint8_t*)&s2;
while (*p1 == *p2)
{
p1++;
p2++;
}
return *p1 > *p2 ? 1 : -1;
}
static int memcmp_unaligned(const char *c1, const char *c2, size_t len)
{
int sh1 = 8 * ((size_t)c2 % sizeof(size_t));
int sh2 = 8 * sizeof(size_t) - sh1;
const size_t *s1 = (const size_t*)c1;
const size_t *s2 = (const size_t*)(c2 - sh1 / 8);
size_t x, y, m;
x = s2[0];
do
{
y = s2[1];
m = MERGE(x, sh1, y, sh2);
if (*s1 != m)
return memcmp_size_t(*s1, m);
s1++;
s2++;
len--;
x = y;
} while (len);
return 0;
}
Where MERGE is already defined in string.c file, len is length in
sizeof(size_t) blocks instead of bytes. It may be even better to switch
to uint64_t instead of using size_t like in memset. You can also take a
look on glibc platform independent implementation (it uses the MERGE
trick + loop unrolling, according to some random benchmark it's on par
with your implementation performance wise on i386/x86_64 and is much
faster on arm).
Thanks,
Piotr
More information about the wine-devel
mailing list