[Bug 53372] Total War Shogun 2 spews RtlLeaveCriticalSection() section is not acquired errors in 3D scenes.

WineHQ Bugzilla wine-bugs at winehq.org
Wed Jul 27 20:07:21 CDT 2022


https://bugs.winehq.org/show_bug.cgi?id=53372

--- Comment #9 from Zeb Figura <z.figura12 at gmail.com> ---
I managed to get access to an NVidia machine and was able to reproduce the same
deadlock on startup.

As described above, the changes to d3d don't make a lot of sense, so I tried
double-checking, and I think I found a different commit that's to blame. It's
hard to be sure, because the deadlock is tetchy and sometimes won't reproduce
until the dozenth run, but I think the offending commit is:

commit 18ae96e5fb3cbbd53f1a022ba81203de6b431228
Author: Zhiyi Zhang <zzhang at codeweavers.com>
Date:   Mon Apr 25 17:22:16 2022 +0800

    winex11.drv: Lock display when expecting error events.

    If the display is not locked, another thread could take the error event and
handle it with the
    default error handlers and thus not handled by the current thread with the
specified error handlers.

    Fix Cladun X2 crash at start.

    Signed-off-by: Zhiyi Zhang <zzhang at codeweavers.com>

More interestingly, if I look at the process state when the game is hung, I
notice that, while the main thread is locked at 100% CPU (waiting for the CS
thread), the CS thread is sleeping, and another CS thread which was already
shut down is also sleeping. Further tracing shows that the "old" CS thread is
terminated and not running any more win32 code, but hasn't actually exited. And
I'm unable (after a few tries) to reproduce with csmt=0.

My suspicion, although I have no way to verify this, is that the NVidia driver
is deadlocking because of a lock ordering problem. I am guessing that it does
thread cleanup with pthread_cleanup_push(), and that inside of that it grabs
some internal lock (a GLX context lock?) and then calls XLockDisplay(), and
that glXCreateContext() grabs the same lock, resulting in a lock inversion when
the latter is called while already in XLockDisplay().

If I'm right, I don't know whose bug this really is. XLockDisplay() is part of
libx11, not the X11 protocol, and while libx11 is documented the behaviour of
threading like this doesn't seem to be specified. If I had to give a reading,
though, I'd say that since there's nothing in the documentation preventing us
from calling glXCreateContext() with a locked display (and since we have a good
reason to do so) this is NVidia's bug.


Patrick, does reverting 18ae96e5fb help? I'd expect it to at least get rid of
the deadlock (although it's possible that I haven't sufficiently tested and
that my analysis is wrong) but it may not get rid of the OOM errors—those may
have a separate cause (e.g. the streaming buffer from 66f37aae7e2 is somehow
growing too large, which would make a lot more sense...)

And if not, does turning off CSMT help?

I was able to reproduce some OOM errors, but they went away after applying both
of the merge requests I linked earlier. (Which are, by the way, both upstream
by now.)

Probably the hang on startup should be split out to a different bug in any
case.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.


More information about the wine-bugs mailing list