Investigating the XInitThreads deadlock bug

Sami Aario sami.aario at surfeu.fi
Wed Feb 4 15:03:20 CST 2004


Hi all,

I decided to investigate this old bug:

http://www.winehq.com/hypermail/wine-devel/2003/12/0158.html

even though Alexander removed thread initialization from x11drv. So the
first thing I did was apply this patch:

Index: x11drv_main.c
===================================================================
RCS file: /home/wine/wine/dlls/x11drv/x11drv_main.c,v
retrieving revision 1.87
diff -u -r1.87 x11drv_main.c
--- x11drv_main.c 30 Jan 2004 22:54:39 -0000 1.87
+++ x11drv_main.c 1 Feb 2004 18:02:26 -0000
@@ -332,6 +332,8 @@

     /* Open display */

+    if (!XInitThreads()) ERR( "XInitThreads failed, trouble ahead\n" );
+
     if (!(display = XOpenDisplay( NULL )))
     {
         MESSAGE( "x11drv: Can't open display: %s\n", XDisplayName(NULL) );

I determined that the tests under user/tests were causing deadlock. The
attached file debugrun1 shows this. Start the debugger with:

winedbg user32_test.exe.so

and repeat the steps I made, and the executable should deadlock at the call
to XGetScreenSaver. (By the way, I used a version of the tests where I'd
commented out all other tests except the test_SPI_GETSCREENSAVEACTIVE one,
and that's why you only see "testing SPI_{GET,SET}SCREENSAVEACTIVE", but
that's not important.)

Next, I decided to investigate which mutual exclusion primitive is causing
the trouble. This run is included as debugrun2. There are two threads
involved: thread 0xb is the application main thread, and 0xe is a thread
created by the main thread to generate the tests. Thread 0xb owns a critical
section that 0xe wants, and hence deadlock. Note the call to "walk thread"
in the beginning. Note also that I single-stepped thread 0xe right up to the
call to RtlpWaitForCriticalSection that deadlocks.

I also added some display commands to show you the details of the critical
section in question. Now, if you look at what the display says you'll
notice something odd: at first crit = 0x0011d1e0, but then it switches to
0x403112b8. And this is without any breakpoints having been set up within
RtlEnterCriticalSection! So it can't be a context switch - it seems like a
bug in the debugger. Or is this intentional in some way?

Anyway, this critical section is the one the x11drv library sets up when
XInitThreads is called. Next I attempted to find out where the main thread
acquires the critical section, before the deadlock. This is attached as
debugrun3. Notice that I worked around the odd debugger feature mentioned
in the previous paragraph, by placing a breakpoint not at the entry point
of RtlEnterCriticalSection, but a few next's later. This allowed me to add
the condition crit==0x403112b8. The "crit" pointer seems to consistently
stay the same between debugger sessions, so this works (for me at least).

See how LockCount is 0, even though this is the first call to
RtlEnterCriticalSection with crit==0x403112b8 during this session? It should
be -1 for a critical section that is being acquired for the first time! So
this made me suspect that the crit sect is acquired during initialization
(and then not released), which is why I applied the following patch:

Index: critsection.c
===================================================================
RCS file: /home/wine/wine/dlls/ntdll/critsection.c,v
retrieving revision 1.23
diff -u -r1.23 critsection.c
--- critsection.c 5 Sep 2003 23:08:34 -0000 1.23
+++ critsection.c 4 Feb 2004 20:15:46 -0000
@@ -243,6 +243,7 @@
  */
 NTSTATUS WINAPI RtlEnterCriticalSection( RTL_CRITICAL_SECTION *crit )
 {
+    if (crit==(RTL_CRITICAL_SECTION *)0x403112b8) TRACE("critsection=%p:
LockCount=%ld\n", crit, crit->LockCount);
     if (interlocked_inc( &crit->LockCount ))
     {
         if (crit->OwningThread == (HANDLE)GetCurrentThreadId())

and then launched the debugger with:

winedbg --debugmsg -all,+ntdll user32_test.exe.so

(I didn't bother with making the reference to crit->LockCount thread-safe.)
The result of this is attached as debugrun4. Notice this line:

No debug information in ELF '/usr/X11R6/lib/libXrender.so.1' (0x40016000)

See how after four calls to RtlEnterCriticalSection, LockCount starts to be
0? So a pretty good theory is that the critical section is acquired, and
then
not released, during the loading and initialization of
/usr/X11R6/lib/libXrender.so.1. This theory is not bulletproof, which is why
I attempted to see whether LockCount ever becomes -1 again, but the debugger
consistently barfed on me (== crashed).

Well, that's it for now - I'll let someone more knowledgeable in libraries
and
their initialization carry on with this, if the're interested. A final note:
naturally, you most likely won't have the same address for the critical
section
as me, but you should be able to find out what it is if you redo the steps I
made. Have fun!

--
Sami
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugrun1
Type: application/octet-stream
Size: 6087 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20040204/fc8803a8/debugrun1.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugrun2
Type: application/octet-stream
Size: 14916 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20040204/fc8803a8/debugrun2.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugrun3
Type: application/octet-stream
Size: 3446 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20040204/fc8803a8/debugrun3.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugrun4
Type: application/octet-stream
Size: 17138 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20040204/fc8803a8/debugrun4.obj


More information about the wine-devel mailing list