ntdll: Fixed some heap allocation stalls

Matteo Bruni matteo.mystral at gmail.com
Sat Nov 3 12:28:21 CDT 2012


2012/11/3 Steaphan Greene <steaphan at gmail.com>:
> On 11/03/2012 09:04 AM, Matteo Bruni wrote:
>>
>> 2012/11/2 Steaphan Greene<steaphan at gmail.com>:
>>>
>>> Running a game in wine showed it performing terribly.  I traced this to
>>> the
>>> fact that it allocates and deallocates tiny memory chunks over and over
>>> (I
>>> suspect it's in C++ and passing things by value everywhere).  This led to
>>> huge stalls because the heap bins weren't fine-grained enough (these
>>> differed in size in steps of 8 bytes, but the bins differed by 16+, so it
>>> spent a lot of time searching each bin for a bigger block).  I added more
>>> fine-grained sizes to the smaller end of this, and now it runs faster in
>>> wine than it does natively. :)
>>>
>>> This was run on Debian squeeze, amd64.
>>>
>>> Note, this is my first submission to wine in nearly 15 years.  So, of
>>> course, everything has changed with how this works now.  Hope I have this
>>> all right.
>>>
>>> ---
>>>   dlls/ntdll/heap.c |    4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c
>>> index a9044714..eb7406b 100644
>>> --- a/dlls/ntdll/heap.c
>>> +++ b/dlls/ntdll/heap.c
>>> @@ -116,7 +116,9 @@ C_ASSERT( sizeof(ARENA_LARGE) % LARGE_ALIGNMENT == 0
>>> );
>>>   /* Max size of the blocks on the free lists */
>>>   static const SIZE_T HEAP_freeListSizes[] =
>>>   {
>>> -    0x10, 0x20, 0x30, 0x40, 0x60, 0x80, 0x100, 0x200, 0x400, 0x1000,
>>> ~0UL
>>> +    0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40, 0x48, 0x50, 0x58, 0x60,
>>> 0x68,
>>> +    0x70, 0x78, 0x80, 0x88, 0x90, 0x98, 0xA0, 0xA8, 0xB0, 0xB8, 0xC0,
>>> 0xC8,
>>> +    0xD0, 0xD8, 0xE0, 0E88, 0xF0, 0xF8, 0x100, 0x200, 0x400, 0x1000,
>>> ~0UL
>>>   };
>>>   #define HEAP_NB_FREE_LISTS
>>> (sizeof(HEAP_freeListSizes)/sizeof(HEAP_freeListSizes[0]))
>>
>> That 0E88 looks quite wrong ;)
>>
>> Apart from that, although I'm not an expert for this code, this patch
>> looks reasonable to me. Maybe we don't want so many free lists, but I
>> don't see big downsides for that (e.g. the linear search can be
>> replaced by a binary search, if need be). Maybe adding a smaller list
>> at the start (e.g. 0x8) makes sense too?
>
>
> Yep, that's a typo.  Don't know how that got past me.  Sorry.  Should I
> resend a corrected version?
>

Yes. You should also add a (try 2) to the email subject.

> 0x08 was left off because these values include the overhead of the arena
> info, so the smallest requested size of 8 actually searches for larger than
> 8 (12, I think).

Ah, good point, I misread the code. Yes, it makes perfectly sense as it is.

>
> I did try with fewer (every 16 instead of every 8), and, though it was still
> a dramatic improvement, it was still slow.
>

I was thinking about e.g. going every 16 after 0x80, or some similar
pseudo-exponential growing, but that really depends on the allocation
pattern of the applications.

Speaking of which, it might be a nice followup patch to add some free
lists usage stats, to get some idea of what different applications do
in that regard.



More information about the wine-devel mailing list