[PATCH v2 00/22] ntdll: Relax the loader locking to match newer Windows behaviour.

Tue Oct 5 17:49:06 CDT 2021

Changes from the previous version accross the patches:
     - use ldr_data_srw_lock instead of introducing a new one when modifying
       WINE_MODREF data with a shared loader lock.
     - protect cached_modref with ldr_data_srw_lock instead of introducing interlocked access;
     - remove redundant 'volatile' qualifier from locked_exclusive;
     - fix and improve tests, split them across the patchset.

I actually mean to send first 7 patches from this series for review
and not 22 at once but it feels like the changes introduced by these
first patches make sense in this exact form only if the general
idea of locking rework is accepted. So I am sending the whole
thing in its current state in a hope it would be possible to see what
I am suggesting and make a decision on the general idea at least.
I did some cleanup on all the functional patches, although would
look once again at those below patch 7 at least before sending
that for review. The test in the last patch obviously needs
more work, it works well for me on the recent enough Windows 10 but not
that well yet on earlier versions, also needs cleaning up and splitting.

Some facts about the current Windows loader behaviour:
    - It seems that loader lock (the one that can be managed externally
through LdrLockLoaderLock) is held only when calling the app's functions
from the loader: DLL entry point, TLS callbacks, LDR notifications and
LdrEnumerateLoadedModules callback.
There is a quirk (at least on the
newer Win10) with LDR notifications: the loader lock is held during
the unload event but not during the load event, at least the first
one. I am guessing that maybe it is unheld only until the first
dependency DLL entry point is called (so they take it once there and
then keep until process attach ends, leaving first load notification
behind), but I didn't test that with the DLL with dependencies yet.
In any case it looks to me this quirk may be left for later.
    - There is apparently other lock mechanics involved as Windows
does not allow loading process of different DLLs to go in parallel;
    - It looks like any function which does not end up requiring
actual loading or unloading of the DLL and does not query anything
from the module currently being loaded or unloaded can be executed in
parallel with the current load process. E. g., LoadLibrary("b.dll")
won't ever block if the DLL is already loaded, LdrAddRefDll never
blocks at all even if called for the module currently being loaded
(hanging in dll entry point process attach) or unloaded (hanging in
detach), LdrUnloadDll will only block if it decrements load count
to zero.
    - The behaviour of the functions operating on the module being
currently loaded or unloaded looks quirky. Here is what I recovered
from testing on the 19043.1237 (that is mostly covered in the test
in the last patch):
        - LdrUnloadDll blocks until the DLL load sequence is complete
and succeeds without waiting if unload sequence is in progress;
        - LdrAddRefDll never blocks but behaviour depends on whether
it is called from the app callback (tested with entry point and LDR
notification) or outside of that (patch 0014).
        - LdrGetDllHandleEx blocks during module unload but not load;
        - LdrGetProcedureAddress waits until the module is being loaded
and immediately returns an error if module is being unloaded;
        - Looks like calling functions from LDR notifications is the
special case. Those that wouldn't block work, but those that would
block if called from outside return error STATUS_NOT_FOUND. That is
apparently not how that behaves from DLL entry points and TLS
callbacks.

    I think the main patch in the series is patch 0008 which introduces
exclusive locks downgrade / upgrade functions. The idea is that we
can take shared or exclusive lock from loader function. The exclusive
lock can be downgraded to shared. However, the thread still holds
exlusiveness in this state in a sense that other exclusive lock cannot
be taken until the initial exclusive lock is released. Shared lock
waiters can joing when the exlusive lock is downgraded. This works
recursively so loader functions can call each other and call outside
callbacks with the lock held, with the exception that exclusive lock
cannot be taken recursively if shared is currently held.
So, the way of using it is:
     - Never call the callbacks with read lock held: that will assert
once the application calls some loader function which takes the
exclusive lock.
     - The way to call the callbacks is to have an exclusive lock
and downgrade it to shared, then the recursive taking of exclusive
lock is ok;
     - If it is unclear at function start whether exclusive lock is
needed or not, the way to go is to start from shared lock, try to
perform the operation (in case of LdrAddRefDll / LdrUnloadLibrary
I used an extra short lived SRW lock), and if that is not possible
release the shared lock and restart from scratch with exclusive.
    I think maybe it would be technically possible not to introduce the
limitation and allow upgrading shared locks to exclusive but it seems
it would make things much more complicated than they are: it would
require some neater and finer grained structure access sync
to avoid invalidating all the data the shared locker depends on at
the moment it wants to upgrade to exclusive and yields to exclusive
library load / unload sequence.

Signed-off-by: Paul Gofman <pgofman at codeweavers.com>
-- 
2.31.1