[PATCH v3 0/2] MR135: Cancel asyncs queued by thread when the thread is terminated

Paul Gofman pgofman at codeweavers.com
Fri Jun 3 20:46:18 CDT 2022

On 6/3/22 20:30, Zebediah Figura wrote:
> On 6/3/22 15:12, Paul Gofman (@gofman) wrote:
> So, to be clear, in the offending program:
> (1) thread terminates itself via NtTerminateThread();
> (2) thread calls exit_thread(), which means that its TEB and stack are 
> queued to be freed but are not actually freed yet;
> (3) thread closes its server socket, causing the server to notice and 
> kill_thread( thread, 0 );
> (4) another thread exits, and the TEB of the first thread is freed
> (5) select async is eventually signaled (we never cancel it, although 
> we should), and the server ends up queueing SIGUSR1 to some random 
> thread, since the original thread is terminated
> (6) APC_ASYNC_IO handler in said random thread tries to access the 
> original thread's stack and crashes
> Do I have that right?
I didn't debug every bit of how and at which moment the exact the stack 
is freed, is that important? What I observed was:

a) Main thread queues user APC to the network thread;

b) Once that APC is executed (requires alertable wait in select() and 
recv()), it ends up being terminated normally, ending up in 
NtTerminateThread for itself;

c) Apparently here, between b) and d) the thread ends up being freed;

d) - (5), (6) from your list.

So AFAIU it matches the sequence you describe.

> So, with that said, thoughts:
> * If a thread dies violently, we can cancel its asyncs from the server 
> side [in kill_thread(), probably? or the terminate_thread handler; see 
> below], and we don't need to worry about them accessing the stack, 
> since the stack is never freed.


> * If a thread dies naturally (terminates itself), we can cancel its 
> asyncs from the terminate_thread handler, and then, in the client side 
> but before calling exit_thread(), do a zero-length wait with 
> SELECT_INTERRUPTIBLE, exactly as in v1 of this patch set. My 
> understanding is that these alone are actually sufficient to ensure 
> that any APC_ASYNC_IO resulting from the async termination are 
> executed by the thread itself before it frees its stack. I.e. we don't 
> need any of the "queue_only" bits that were in v1. This holds because:
>   - terminate_thread doesn't set the state to TERMINATED yet in the 
> "self" case;


>   - as a result queue_apc() will queue the (system) APC to that 
> specific thread;

It will not put it to the queue, it will send SIGUSR1. As (without the 
long and probably ugly part which was in v1 of the patch) it puts the 
APC in queue only if the thread is waiting in server select or already 
has some APC in queue. The other way (probably not super nice as well) 
would be to put APC_NONE to the thread system APC queue, then the logic 
in queue_apc() would queue our async cancels to the queue as well.

>   - server_select( SELECT_INTERRUPTIBLE ) is actually sufficient to 
> process the entire APC queue. It doesn't matter if SIGUSR1 is sent 
> (and in this case it will always be—we're in the middle of a server 
> request but not a server *select* at the point the APC is queued), 
> SIGUSR1 is only needed to wake up a thread that isn't otherwise paying 
> attention.

server_select( SELECT_INTERRUPTIBLE ) is sufficient, but due to the 
above it will be redundant in most of the practical cases as the server 
would already have sent the USR1;

> * There are some other potential bugs in this area I noticed while 
> looking around (e.g. we should probably requeue system APCs of a 
> terminated thread to another thread instead of dropping them on the 
> floor, also apparently cancellation should always wait), but those 
> seem orthogonal to the problem here and so can wait.
> So to fix the bug it should be sufficient to
> (a) cancel all thread asyncs in the terminate_thread handler,
> (b) server_select( SELECT_INTERRUPTIBLE ) before calling exit_thread()

As I understand, in the logic on v1 patch (which I believe is a bit more 
robust albeit much more convoluted) we also need to make sure that the 
APCs get to the queue and not USR1 (for the 'self' terminating thread). 
In the logic of v2 server_select() is redundant, the thread will get 
USR1 and process it itself before returning from server call except for 
some theoretical corner cases (like, absent thread kill syscall when the 
syscall will be sent to the whole process and thus random thread).

More information about the wine-devel mailing list