[Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

Mon Feb 8 10:51:36 CST 2021

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #17 from Zebediah Figura <z.figura12 at gmail.com> ---
(In reply to Jacek Caban from comment #16)
> I meant it as a replacement, not an addition. I may have been an overly
> optimistic about just 2 dim array, 3 dim could be less wasteful (although
> there may be a better structure). The point is to make a single tid slot
> cheap enough (4 bytes seems more than enough, but it could be even larger if
> needed) so that we may over-allocate and never free. Over-allocation makes
> allocation of a single slot cheap (because it will usually be already
> allocated) and never freeing avoids any need for locking.
> 
> It may seem like a wasteful strategy, but note that server gives us a good
> locality of tid values and reuses previously freed ids. In practice, I'd
> expect that a single block of a few thousands entries should be enough for
> majority of applications. If we optimize for that (while being able to
> handle even a theoretical case of 2**30 created threads with enough
> allocations), I'd hope that we can get decent results. I can't be sure
> without trying, maybe I'm missing something.

Okay, that's good, that sounds like the ideal strategy to me too. I just wasn't
sure if there was a reasonable tradeoff between maximum number of threads
possible + amount of memory unnecessarily reserved + lookup time. (I am less
than completely enthused about the idea of a 3-dimensional list—I think we
should probably be optimizing for lookup time above all else.)

I think we do need 8 bytes, at least for 64-bit, or perhaps just for 64-bit
Mac. A futex is 4, but on Mac we use semaphore_t, which is a pointer (though I
haven't looked to see if there's a better solution), and if neither is
available we use an event (we could possibly use a keyed event, though we'd
have to introduce some locking—but we're already on the slow path).

Thinking out loud here: a two-dimensional array of 65536 bytes * 4096 entries
would give us 2**26 threads on 32-bit, 2**25 on 64-bit, while using 16+(4 or 8)
pages in the common case. 32-bit has a much lower limit imposed by the stack
size, so that's not a concern, but is that enough threads for 64-bit? Raising
that to 2**30 for 64-bit would be a reservation of 65536 entries = 128 pages,
which is kind of a lot, though in practice only the first is ever faulted in
and we have VA space to spare.

Does that sound like a reasonable compromise? I think we could even shrink the
32-bit reservations so to use less address space.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.