[RFC PATCH 00/11] Thread-local heap implementation.
Dmitry Timoshkov
dmitry at baikal.ru
Wed May 6 10:32:30 CDT 2020
Rémi Bernon <rbernon at codeweavers.com> wrote:
> >> This is a heap implementation based on thread-local structures, that I
> >> have been keeping locally for quite some time. The goal was to improve
> >> Wine's heap performance in multithreaded scenarios and see if it could
> >> help performance in some games.
> >>
> >> The good news is that this implementation is performing well, according
> >> to third-party heap micro benchmarks. The bad news is that it doesn't
> >> change performance much in general, as allocations are usually scarse
> >> during gameplay. I could still see improvements for loading times, and
> >> less stalling as well.
> >
> > Have you looked at the Sebastian's heap improvements patches in the staging
> > tree? According to Sebastian's and Michael's testing "The new heap allocator
> > uses (inspired by the way how it works on Windows) various fixed-size free
> > lists, and a tree data structure for large elements. With this implementation,
> > I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern",
> > and up to [b]15%[/b] improvement in the "good case"."
> >
>
> I believe these patches are also shipped in Proton, and although it's
> performing better than the upstream heap there's still a lot of
> contention when multiple threads try to (de)allocate at the same time.
>
> For reference I used https://github.com/mjansson/rpmalloc-benchmark as
> raw performance measurement. They start a given number of threads, with
> each thread doing a fixed number of iterations. Every iteration the
> thread allocates and frees a certain amount of memory, eventually with
> cross-thread allocation every other iteration, then does a given number
> of computation using the allocated buffers as storage. Then it measures
> the time it took to do all these operations.
>
> For instance, with these benchmark parameters as indicated on their
> sample result page[1]:
>
> <num threads> 0 0 2 20000 50000 5000 16 1000
>
> I have the following results with the various implementations and using
> two concurrent threads (the higher the number of threads, the worse it
> gets, especially for the default Wine heap):
>
> * linux crt: 5675754 memory ops/CPU second, 53% overhead
> * wine rpmalloc: 19700003 memory ops/CPU second, 131% overhead
> * wine upstream: 248333 memory ops/CPU second, 62% overhead
> * wine staging: 914004 memory ops/CPU second, 61% overhead
> * wine lfh: 10651300 memory ops/CPU second, 114% overhead
Do you have the numbers for various Windows flavours on the same hardware?
--
Dmitry.
More information about the wine-devel
mailing list