msvcrt - memmove/memcpy optimizations

Paul Gofman pgofman at codeweavers.com
Fri Aug 14 06:46:12 CDT 2020


Regarding memcpy performance, I also recently came through suboptimal 
memcpy / memmove performance while doing perf analysis of Shadow of The 
Tomb Rider game. While in that case I did not find memcpy to be 
responsible for any sufficient slow down (maybe ~2-3 fps as maximum 
together with math functions implementation), it brought attention by 
consistently appearing in perf top and taking some measurable CPU time 
estimated otherwise.

I am attaching a very short test program. That runs ~7.4s using builtin 
vcruntime140 here and ~2s using native vcruntime140 under Wine (compiled 
as x86_64-w64-mingw32-gcc ./memcpyperf.c -o memcpyperf).

On 8/14/20 11:27, piotr at codeweavers.com wrote:
> Hi Fabian,
>
> I'll be back from vacation on Monday (currently I have very limited 
> internet access). I'll look on it then.
>
> I'm not sure how complicated the assembly implementation is but I'm 
> expecting that a separated assembly file will not be needed. Also, 
> AFAIK, we can't take the implementation from glibc. It would be also 
> useful to know how efficient Microsoft implementation is.
>
> Musl also have platform specific implementation of memove (for i386 
> and x64) written is assembly. I bet it should be good enough for Wine.
>
> Thanks,
> Piotr
>
> On Aug 12, 2020 23:33, Fabian Maurer <dark.shadow4 at web.de> wrote:
>
>     Hello,
>
>     since msvcrt isn't relying on the standard library memmove/memcpy
>     anymore,
>     there's been a pretty bad performance regression. See
>     https://bugs.winehq.org/
>     show_bug.cgi?id=49663.
>
>     For the best performance, and since those memory operations are
>     pretty common,
>     we'd presumably like to optimize them as much as possible. You
>     might have seen
>     my patch for an implementation from musl, although Zebediah
>     rightfully pointed
>     out we might want to opt for the best performance we can get...
>     glibc currently offers the best performance, thanks to SSE/AVX
>     implementations
>     and runtime selection of the best supported path.
>
>     First, would you have any objections adding specialized paths
>     written in
>     assembly for x86?
>     And if we were to add them, would we link against assembly files,
>     or someway
>     transform them into inline assembly? AFAIK, Wine didn't come with
>     pure
>     assembly files yet...
>
>     If you want, I could set up a few crude benchmarks to see how
>     different
>     versions compare.
>
>     Regards,
>     Fabian Maurer
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20200814/99e1f799/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memcpyperf.c
Type: text/x-csrc
Size: 1003 bytes
Desc: not available
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20200814/99e1f799/attachment.c>


More information about the wine-devel mailing list