[PATCH vkd3d v2 4/4] vkd3d: Map timeline semaphore values to fence virtual values and submit commands in worker threads.

Mon May 2 07:51:15 CDT 2022

On Fri, 29 Apr 2022 at 15:21, Conor McCarthy <cmccarthy at codeweavers.com> wrote:
> Correct fence behaviour requires:
>
> Map monotonically increasing timeline values to fence virtual values to
> avoid invalid use of Vulkan timeline semaphores. In particular, non-
> increasing values and value jumps of >= 4G are required for d3d12.
>
> Create a worker thread for each queue to handle queue commands. This
> allows blocking of wait submission until an unblocking signal is
> submitted, so out-of-order waits are handled correctly when two d3d12
> queues are mapped to the same Vulkan queue.
>
> Threaded queue submission also fixes the old fence implementation so it is
> fully functional, though a bit less efficient than timeline semaphores.
>
> Based in part on vkd3d-proton patches by Hans-Kristian Arntzen.
>
> Signed-off-by: Conor McCarthy <cmccarthy at codeweavers.com>
> ---
> v2: Always broadcast on the null event condition after signaling.
>
> Threaded command queues on their own seem to bring out the problems with the
> existing fence implementation, which results in games crashing. Combining them
> with the fence changes avoids that issue.

Is there any reason we couldn't apply the fence changes before the
command queue submission changes?

The commit message could be clearer, but if I understood it correctly,
the command queue submission changes are intended to address the
following scenario:

    - We have a fence F1, and two d3d12 command queues V1 and V2
sharing the same physical Vulkan queue Q1.

    - The application submits a Wait(F1, x) on V1.

    - The application submits a Signal(F1, x) on V2.

    - The Wait() on V1 never completes, because it's submitted to Q1
before the Signal() on V2.

    - And to complicate things some more, V2 may not be created until
after the Wait().

Correct? That explains the need to delay Wait() submission, but not
the need to introduce the worker threads. Is there anything preventing
us from buffering these Wait() calls (and any subsequent Execute()
calls) much like we do in this patch, and then simply submitting them
when the corresponding Signal() is submitted?

If we treat the need for these worker threads as a given for the sake
of argument, the scheme introduced in this patch looks suspiciously
similar to wined3d's command stream. The wined3d CS is probably a fair
bit more complicated than what's needed here, but could we please at
least keep the terminology consistent?

d3d12_command_queue_acquire_serialised() is going to drain (i.e.,
stall) the queue on each vkd3d_acquire_vk_queue() call. That seems
undesirable by itself, but in the case of Wine's dxgi/d3d12, that's
going to happen on every Present() call. Perhaps worse, is there
anything preventing an application from attempting to drain the queue
(e.g. by calling vkd3d_acquire_vk_queue()) between a Wait() and its
corresponding Signal()?