[PATCH] msvcrt: Import memmove from musl

Thu Aug 20 06:35:09 CDT 2020

Hi Fabian,

I'm still looking on possible ways of optimizing the function. You patch 
is only affecting a subset of memmove calls. It also slows down some 
cases a lot (around 1.5-2 times). I don't have ready code yet but it 
looks like it will be possible to write C implementation that is ~10% 
slower than native.

Also quick testing shows that gcc and clang optimizes a simple 
implementation very well. Something like:
https://source.winehq.org/patches/data/191083
(it's incorrect, I didn't mean to send it to wine-devel yet) has similar 
performance as native if -O2 option is used. The same implementation is 
terribly slow if -O0 is used.

I'm not sure yet how complicated the code that is not depending on 
compiler to optimize it will be. I'm planning to implement some proof of 
concept patch to check it next.

I'm hoping that we will come with a better patch but here are few 
comments about your patch:
  - the __GNUC__ checks are not needed
  - the WT alias is not needed
  - it doesn't work correctly in d==s case on invalid pointers / write 
watches
  - it decreases performance a lot if buffers overlap or word copying 
patch is not used

I've also tested full implementation from musl (that uses their memcpy 
implementation in some cases). It performs much better. It's much slower 
than native if buffers overlap (around 3 times slower). It should be 
possible to optimize this case as well.

Thanks,
Piotr