[PATCH 01/15] server: Wait before suspending threads in APC.

Jacek Caban jacek at codeweavers.com
Tue Jan 28 08:50:07 CST 2020


On 28.01.2020 14:43, Rémi Bernon wrote:
> Let's name these locations:
>
> [A]:
> @@ -1591,6 +1595,7 @@ DECL_HANDLER(select)
>              if (apc->call.type != APC_NONE &&
>                  (reply->apc_handle = alloc_handle( current->process, 
> apc, SYNCHRONIZE, 0 )))
>              {
> +                current->in_apc = 1;
>                  reply->call = apc->call;
>                  release_object( apc );
>                  break;
>
> [B]:
> @@ -1575,6 +1577,8 @@ DECL_HANDLER(select)
>          wake_up( &apc->obj, 0 );
>          close_handle( current->process, req->prev_apc );
>          release_object( apc );
> +        current->in_apc = 0;
> +        stop_thread_if_suspended( current );
>      }
>
>      reply->timeout = select_on( &select_op, op_size, req->cookie, 
> req->flags, req->timeout );
>
> [C]:
> @@ -583,6 +584,7 @@ static void set_thread_info( struct thread *thread,
>  /* stop a thread (at the Unix level) */
>  void stop_thread( struct thread *thread )
>  {
> +    if (thread->in_apc) return;  /* currently doing apc, will be 
> suspended on return */
>      if (thread->context) return;  /* already inside a debug event, no 
> need for a signal */
>      /* can't stop a thread while initialisation is in progress */
>      if (is_process_init_done(thread->process)) send_thread_signal( 
> thread, SIGUSR1 );
>
>
>
> When the APC_BREAK_PROCESS executes, a servicing thread calls the 
> select request, which returns through [A], flagging the thread.
>
> It returns from the select request, and does the APC call -which 
> creates the exception thread. Then, I see two possible ordering:
>
> 1) The servicing thread calls first the select request to notify APC 
> completion, goes to [B] which wakes the APC caller -resuming 
> DbgBreakProcess on the debugger side, and unflags the thread but it's 
> not suspended yet. The exception thread does the queue_exception_event 
> request, which ultimately goes to [C] when suspending the process. The 
> servicing thread is not flagged, so it is suspended.
>
> 2) The exception thread does the queue_exception_event first, which 
> goes to [C], but as the servicing thread is still flagged, do not 
> suspend it. The servicing thread then does the select request, goes to 
> [B] which wakes the APC caller -resuming DbgBreakProcess- then unflags 
> and suspends it as it should be.
>
> Now I agree there could still be some race conditions in case 2):
>
> * The additional thread suspension in [B] is done after the wake_up 
> for the APC caller, it should probably be done before, just to be sure.
>
> * If the debugger is calling WaitForDebugEvent in parallel to the 
> DbgBreakProcess call. In this case, the debugger could maybe be 
> notified of a debug event after the exception thread created it but 
> before the servicing thread notifies of the APC completion. I don't if 
> that's supposed to be supported, or how bad it is to have the 
> servicing thread still alive for a bit.
>
> If we want to have the later scenario be race-free, it would require 
> to link the debug even only when the servicing thread is suspended. 
> Or, to synchronize the servicing and exception thread somehow so that 
> the servicing always does its select request first.


I think the more general problem is that any system APCs may be 
interrupted and suspended. If DbgBreakProcess is the only concern, we 
could probably hack it to not wait for APC result at all - it doesn't 
really need to (except for propagating OOM-like errors). For the general 
problem, let me use an example with another thread that just happens to 
suspend the process about the same time as DbgBreakProcess happens:

1. Debuggee thread calls server_select and enters the uninterrupted 
section in wine_server_call, but does not yet send the request

2. Thread 1 (of any process) requests suspending of debuggee process

3. Debugger's thread 2 calls DbgBreakProcess on debuggee process and 
waits for APC completion

4. Server processes suspend request from (2) and sends signals to all 
debuggee threads

5. Server processes break APC request from (3) and queues an APC

6. Debuggee thread from (1) finally sends the request and server replies 
with queued APC

7. Debuggee thread leaves uninterrupted section, and receives queued 
signal, which suspends the thread while it's about to process a system APC

You could maybe detect the problem on server side in (6), return pending 
and let signal handler to handle the APC. I'm a bit sceptical about that 
solution.


Another solution would be to ensure that system APCs are ran with 
signals blocked on client side. It's quite tricky through as it would 
require disabling signals while waiting in server_select (except for 
user APCs). This could be fine if we passed CPU context in select 
request, so that server wouldn't need any signal to suspend the thread.


Jacek




More information about the wine-devel mailing list