[PATCH] wineusb.sys: Remove device from the device list on PDO removal
Zebediah Figura (she/her)
zfigura at codeweavers.com
Fri Aug 20 20:43:59 CDT 2021
On 8/2/21 3:04 PM, Zebediah Figura (she/her) wrote:
> On 8/2/21 11:59 AM, Torge Matthies wrote:
>> Fixes a crash when shutting down a prefix.
>>
>> Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=51479
>> Signed-off-by: Torge Matthies <openglfreak at googlemail.com>
>> ---
>> dlls/wineusb.sys/wineusb.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/dlls/wineusb.sys/wineusb.c b/dlls/wineusb.sys/wineusb.c
>> index fae297915fc..28a36b9f380 100644
>> --- a/dlls/wineusb.sys/wineusb.c
>> +++ b/dlls/wineusb.sys/wineusb.c
>> @@ -413,12 +413,15 @@ static NTSTATUS pdo_pnp(DEVICE_OBJECT
>> *device_obj, IRP *irp)
>> break;
>> case IRP_MN_REMOVE_DEVICE:
>> + EnterCriticalSection(&wineusb_cs);
>> remove_pending_irps(device);
>> libusb_unref_device(device->libusb_device);
>> libusb_close(device->handle);
>> + list_remove(&device->entry);
>> IoDeleteDevice(device->device_obj);
>> + LeaveCriticalSection(&wineusb_cs);
>> ret = STATUS_SUCCESS;
>> break;
>>
>
> This looks wrong; the device could have been removed by
> remove_usb_device().
>
> [1] implies that we actually shouldn't remove the device at all unless
> we reported it missing via IRP_MN_QUERY_DEVICE_RELATIONS, which probably
> implies that we shouldn't remove and delete the device here, but should
> instead remove (and delete?) it from the parent
> IRP_MN_SURPRISE_REMOVAL/IRP_MN_REMOVE_DEVICE. It's not particularly
> clear though. I'll have to run some tests and check native drivers, but
> I won't have access to those until next week.
>
> [1]
> https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-device-in-a-bus-driver
>
>
(TL;DR: Lots of research and observation follows. I'm currently writing
patches that solve the initial bug.)
I've done more research, and the situation is horrible. I think this is
not only the worst API I've ever seen, but it would be the worst even if
I wasn't considering the fact that it's a kernel API.
The first, horrible thing, is that despite what I said in [2],
IRP_MN_REMOVE_DEVICE isn't actually guaranteed to be the last IRP
received. It's only guaranteed to be the last IRP under certain
conditions, which is why [1] instructs to retain the PDO if it hasn't
actually been physically removed (*and* that this fact has been reported
to the system).
In particular, apparently if you expose a child PDO with RawDeviceOK set
to 1 but which has a function driver, Windows will send
IRP_MN_START_DEVICE, then IRP_MN_QUERY_REMOVE_DEVICE, then
IRP_MN_REMOVE_DEVICE, then attach a function driver and restart the same
device object. Which means that you can't actually delete the device.
What then becomes unclear is what you're supposed to do if the parent
device was deleted (which is, essentially, what we do at shutdown as
well. I don't know if Windows does this at shutdown; I remember trying
to test and being unable to conclude that Windows does or doesn't try to
tear down the device tree—if it does, it does so late that it's
impossible to log this fact anywhere).
According to [3] a function driver is supposed to defer deleting the
child PDO even when the child PDO has received IRP_MN_REMOVE_DEVICE, and
instead delete it in the parent FDO's IRP_MN_REMOVE_DEVICE. You'd think
that the child could take its cue from IRP_MN_SURPRISE_REMOVAL that it
can safely delete itself, but no.
The "toaster" driver sample from the Windows SDK, which has been my
source of a lot of this, corroborates this—it deletes all child PDOs in
the parent FDO's IRP_MN_REMOVE_DEVICE. However, it also says this:
// Typically the system removes all the children before
// removing the parent FDO. If for any reason child Pdos are
// still present we will destroy them explicitly, with one
exception -
// we will not delete the PDOs that are in
SurpriseRemovePending state.
And it also does something which I can't find documented anywhere: in
the parent FDO's IRP_MN_SURPRISE_REMOVAL it marks all child devices as
"reported missing" so that they *are* deleted in the (child PDO)
IRP_MN_REMOVE_DEVICE. It also removes them from the device list so that
they aren't reported in a subsequent call to
IRP_MN_QUERY_DEVICE_RELATIONS. It doesn't actually call
IoInvalidateDeviceRelations() so this happens the "usual" way—no, it
just pretends that the PnP manager already knows.
And then I tried writing some tests with native hidclass to see what
that does, and as far as I can tell it doesn't do that thing: it keeps
reporting the child in response to IRP_MN_QUERY_DEVICE_RELATIONS, even
if I call that IRP from the (child PDO) IRP_MN_REMOVE_DEVICE. And it
doesn't seem to delete the child PDO either, even as late as the
minidriver FDO's IRP_MN_REMOVE_DEVICE. (Maybe it calls the minidriver
IRP_MJ_PNP handler and then deletes children? It's basically impossible
to confirm that it does delete the children and doesn't leak them...)
A different driver sample, "gameenum" from the Windows XP DDK, seems to
match the documentation and hidclass behaviour. It doesn't delete
anything in FDO IRP_MN_SURPRISE_REMOVAL, and doesn't delete anything in
the PDO removal IRPs, but deletes all of the child PDOs in the FDO
IRP_MN_REMOVE_DEVICE. It also keeps reporting devices in
IRP_MN_QUERY_DEVICE_RELATIONS until the FDO IRP_MN_REMOVE_DEVICE.
Ultimately I think it might not even matter all that much. I think we
could get away with removing the device in the PDO IRP_MN_REMOVE_DEVICE
and things would still work. I also checked and Windows seems to ignore
if IoInvalidateDeviceRelations() is called while removing a device;
which is especially important for Wine as we can reasonably get hotplug
events while trying to shut down the system. That does mean that we
won't always get IRP_MN_REMOVE_DEVICE for a child device, though, which
means that we will always need to clean up leftover child devices in the
FDO IRP_MN_REMOVE_DEVICE. Given that it may not be a bad idea to just
always clean up child devices there.
That's roughly how we handled things before 84c9206ca. That commit
reused "removed" for a second purpose—to finish and prevent any pending
IRPs so that a child or grandchild device wouldn't get stuck waiting for
them in IRP_MN_REMOVE_DEVICE (which of course executes after parent
SURPRISE_REMOVAL but before parent REMOVE_DEVICE). I think it would be
reasonable to also unlink the device in the child (PDO)
IRP_MN_SURPRISE_REMOVAL, and just continue to reuse "removed" for
multiple purposes.
We could also follow closer to the documentation and the hidgame sample,
though, and use separate per-device flags to track whether the device
has been unplugged and whether it is delete-pending (as it were). It's
not really clear to me that we need to, though maybe it's better for
clarity anyway...
ἔρρωσθε,
Zebediah
[1]
https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-device-in-a-bus-driver
[2] https://www.winehq.org/pipermail/wine-devel/2021-June/189773.html
[3]
https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-device-in-a-function-driver
More information about the wine-devel
mailing list