ntdll: Fixed some heap allocation stalls
Matteo Bruni
matteo.mystral at gmail.com
Sat Nov 3 12:28:21 CDT 2012
2012/11/3 Steaphan Greene <steaphan at gmail.com>:
> On 11/03/2012 09:04 AM, Matteo Bruni wrote:
>>
>> 2012/11/2 Steaphan Greene<steaphan at gmail.com>:
>>>
>>> Running a game in wine showed it performing terribly. I traced this to
>>> the
>>> fact that it allocates and deallocates tiny memory chunks over and over
>>> (I
>>> suspect it's in C++ and passing things by value everywhere). This led to
>>> huge stalls because the heap bins weren't fine-grained enough (these
>>> differed in size in steps of 8 bytes, but the bins differed by 16+, so it
>>> spent a lot of time searching each bin for a bigger block). I added more
>>> fine-grained sizes to the smaller end of this, and now it runs faster in
>>> wine than it does natively. :)
>>>
>>> This was run on Debian squeeze, amd64.
>>>
>>> Note, this is my first submission to wine in nearly 15 years. So, of
>>> course, everything has changed with how this works now. Hope I have this
>>> all right.
>>>
>>> ---
>>> dlls/ntdll/heap.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c
>>> index a9044714..eb7406b 100644
>>> --- a/dlls/ntdll/heap.c
>>> +++ b/dlls/ntdll/heap.c
>>> @@ -116,7 +116,9 @@ C_ASSERT( sizeof(ARENA_LARGE) % LARGE_ALIGNMENT == 0
>>> );
>>> /* Max size of the blocks on the free lists */
>>> static const SIZE_T HEAP_freeListSizes[] =
>>> {
>>> - 0x10, 0x20, 0x30, 0x40, 0x60, 0x80, 0x100, 0x200, 0x400, 0x1000,
>>> ~0UL
>>> + 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40, 0x48, 0x50, 0x58, 0x60,
>>> 0x68,
>>> + 0x70, 0x78, 0x80, 0x88, 0x90, 0x98, 0xA0, 0xA8, 0xB0, 0xB8, 0xC0,
>>> 0xC8,
>>> + 0xD0, 0xD8, 0xE0, 0E88, 0xF0, 0xF8, 0x100, 0x200, 0x400, 0x1000,
>>> ~0UL
>>> };
>>> #define HEAP_NB_FREE_LISTS
>>> (sizeof(HEAP_freeListSizes)/sizeof(HEAP_freeListSizes[0]))
>>
>> That 0E88 looks quite wrong ;)
>>
>> Apart from that, although I'm not an expert for this code, this patch
>> looks reasonable to me. Maybe we don't want so many free lists, but I
>> don't see big downsides for that (e.g. the linear search can be
>> replaced by a binary search, if need be). Maybe adding a smaller list
>> at the start (e.g. 0x8) makes sense too?
>
>
> Yep, that's a typo. Don't know how that got past me. Sorry. Should I
> resend a corrected version?
>
Yes. You should also add a (try 2) to the email subject.
> 0x08 was left off because these values include the overhead of the arena
> info, so the smallest requested size of 8 actually searches for larger than
> 8 (12, I think).
Ah, good point, I misread the code. Yes, it makes perfectly sense as it is.
>
> I did try with fewer (every 16 instead of every 8), and, though it was still
> a dramatic improvement, it was still slow.
>
I was thinking about e.g. going every 16 after 0x80, or some similar
pseudo-exponential growing, but that really depends on the allocation
pattern of the applications.
Speaking of which, it might be a nice followup patch to add some free
lists usage stats, to get some idea of what different applications do
in that regard.
More information about the wine-devel
mailing list