[PATCH v4 1/8] server: Actually set initial status in set_async_direct_result handler.

Tue Mar 8 09:33:00 CST 2022

On 3/8/22 03:11, Zebediah Figura wrote:
> On 3/5/22 01:52, Jinoh Kang wrote:
>> On 3/5/22 09:13, Zebediah Figura wrote:
>>> On 3/4/22 13:04, Jinoh Kang wrote:
>>>> On 3/5/22 02:47, Zebediah Figura wrote:
>>>>> Right, but I think it's possible to trigger this assert now. For a rather pathological case, consider a client which calls set_async_direct_result on an async which is currently being serviced by a device driver and has not yet reached get_next_device_request.
>>>>
>>>> For that to be possible, something has to terminate the async so that set_async_direct_result does not reject it.
>>>> The easiest way to do so is to cancel it, or make it timeout somehow.
>>>
>>> Okay, I don't think that's possible for a device async. I'm still more comfortable removing the assert in this patch, though; I'm not sure that there isn't a way to wrangle a server crash...
>>
>> Understood.
>>
>>>
>>>>
>>>> In fact, I think we have discovered an _existing_ bug: the client can use set_async_direct_result to interfere with cancelled asyncs (device-backed or not).
>>>
>>> Sure, but the client is basically considered free to sabotage itself.
>>
>> My concern was that it might interfere with the driver's assumptions; looks like it shouldn't happen, though.
>>
>>> We just can't let it crash the server, or mess with an object that it doesn't have the right access bits to.
>>>
>>>>
>>>> Further investigation revealed some other bugs, which is tangentially related:
>>>>
>>>> 1. An async may time out before the client calls set_async_direct_result to report back the status of initial I/O.  In this case, the timeout is ignored.
>>>
>>> I don't think this is a problem /a priori/, though.
>>
>> The problem is that a timeout ignored *once* is ignored *forever* throughout the async's lifetime.
>> We don't restore the timeout at all once the async enters the pending state.
>>
>> An example scenario is as follows:
>>
>> 1. Client calls recv_socket.
>> 2. Server sets async timeout and returns.
>> 3. The timeout is triggered, but the async is already in terminated state; thus, it is ignored.
>> 4. Client calls set_async_direct_result with status = STATUS_PENDING.
>> 5. Now, the async will potentially linger forever now that the timeout has already been expired.
> 
> Ah, right. Unless I'm mistaken that's also a preëxisting problem.

Alright. I'm fixing this in separate patch series, then.

> 
>>
>>> In fact, I'm somewhat concerned that if we let synchronous I/O time out before it even had a chance to start, we would break applications that expect it to work.
>>>
>>>> 2. There is a time window where alerted asyncs ignore IoCancel/IoCancelEx requests.  If async_terminate( async, STATUS_ALERTED ) is called, the async will ignore any cancellation requests until the APC_ASYNC_IO routine returns.
>>>>      Note that this bug has existed *before* set_async_direct_result, and I think this one may need to be addressed first.
>>>
>>> Yes, this has been a bug for a while. I don't remember if this was a problem before the async rework in 7.0. We probably need to "buffer" the cancellation request somehow, and process it once the async is restarted. (As above, I don't think we want to cancel the async if the client is going to report success to us.)
>>
>> Sounds like a great idea.  Not sure which one is the best way to implement it, though.  A few options come to my mind:
>>
>> - Allow async_terminate() to override status if async->alerted is set, even if it is already terminated.
> 
> We used to have this (before a5b6e90d48e), but I don't think this actually helps. If the client restarts we're back where we started. We could account for it there but I don't much like the added complexity.
> 
>> - Add a separate status field for buffering cancellation requests.
> 
> This seems more desirable to me. We'll need to do the same thing with timeouts.

Perhaps add cancel_status in struct async?  It would hold STATUS_TIMEOUT or STATUS_CANCELLED (or whatever).

> 
>>
>>>
>>>>
>>>> So what I think we need here is a mechanism to tell alerted asyncs apart from irreversibly-terminated asyncs.
>>>> This will solve all of the problems above:
>>>>
>>>> - It's no longer possible to call set_async_direct_result on a device async in the first place, since it will now reject irreversibly-terminated asyncs while still accepting alerted asyncs (which is not possible for device asyncs).
>>>> - Timeout and cancellation works as expected: set_async_direct_result will treat timed out asyncs as irreversibly-terminated.  (the client needs to be modified to handle this case explicitly, though.)
>>>>
>>>> Any thoughts?
>>>
>>> Maybe I'm missing something, but isn't that what the "alerted" field is for?
>>
>> You're right. Sorry for missing that.  Also, the cancellation buffering approach you suggested makes this unnecessary anyway.
>>

-- 
Sincerely,
Jinoh Kang