RFC: Rework of wined3d cs fencing

Stefan Dösinger stefandoesinger at gmail.com
Wed Feb 23 08:20:23 CST 2022


Here's another suggestion on how to track resources: Don't. Or at least, not 
unconditionally.

In d3d11 we have default, immutable, dynamic and staging resources. Default 
and immutable resources can't be mapped. Dynamic ones should be mapped with 
no-overwrite or discard, which doesn't require tracking on our end. That 
leaves staging resources, but they can't be used for drawing.

So in the event that a resource is mapped synchronously just stall the 
pipeline. If it happens too often (or we're running a d3d <= 9 client) we can 
always start to track draws. I see at least World of Tanks sync-mapping 
staging resources, so presumably we need to track CopyResource and friends, 
but those are easier than draws.

The same consideration applies to d3d9ex, since there's no managed pool (well, 
except for that 0x6 re-add-D3DPOOL-MANAGED pool .NET stuff uses). Track copies 
and stall if a D3DPOOL_DEFAULT resource is mapped without async map flags.

I know of d3d9 and earlier applications (e.g. 3DMark2001) abusing managed 
resources and expecting maps to be fast if the resource hasn't been used 
lately. So we need a fallback to the current mode. We already have the 
context->ops->acquire_resource indirection, so we should be able to add it 
without another indirection. Eventually I think we should pull that 
indirection into acquire_shader_resources (write three versions of 
acquire_shader_resources, one that calls resource_acquire directly, one for 
deferred contexts, one no-op) and set the right one for the context. That way 
we don't do an indirect jump for each resource and can no-op the entire thing 
if we know we won't track those resources.


Am Freitag, 11. Februar 2022, 13:13:09 EAT schrieben Sie:
> Hi, below is an email I drafted that explains the acquiring reduction idea.
> 
>> 
> Perhaps for draw and dispatch calls we could track most recent times any
> resource is bound and unbound, as well as, globally, the most recent time
> of a draw/dispatch call. We wouldn’t iterate over all bound resources
> during each draw/dispatch, instead, to see if a resource is busy, we would
> check a) if it is currently bound or will be bound, and if so b) if there
> are any draws/dispatches in the queue.
> 
> To be more specific, in struct wined3d_resource we’d keep
> “most_recent_bind_time” and “most_recent_unbind_time”, and struct
> wined3d_cs (or somewhere else) “most_recent_draw_time” and
> “most_recent_dispatch_time”. Then check:
> 
> 1) if tail < most_recent_bind_time or tail < most_recent_unbind_time or
> most_recent_unbind_time < most_recent_bind_time, the resource is considered
> bound 2) if tail < most_recent_bind_time and tail >
> most_recent_unbind_time, and most_recent_draw/dispatch_time <
> most_recent_bind_time, we’re idle 3) if the resource is bound, and tail <
> most_recent_draw/dispatch_time, the resource is busy 4) otherwise we're
> idle.
> 
> that would be in addition to what you proposed, which I think works fine for
> other calls.
> 
> To reduce graphics/compute false positives, “most_recent_bind_time” and
> “most_recent_unbind_time” could be tracked separately for both use cases.
> 
> There’s still some potential for false positives, if the queue contains:
> unbind “A”, …, draw, …, bind “A”, we’d consider resource “A” to be busy
> until the unbind is executed. But maybe that case is benign enough to
> ignore.
> 
> Also, another thing to think about is how to better handle acquiring
> resources in deferred contexts.
> > On 4 Jan 2022, at 16:03, Stefan Dösinger <stefandoesinger at gmail.com>
> > wrote:
> > 
> > Hi,
> > 
> > Before the holidays I spent some time optimizing the cs resource fencing
> > code. The current state is attached for review. I'll send it for
> > upstreaming after the code freeze.
> > 
> > The basic idea is to use the default queue head and tail for fencing. This
> > completely removes any work on the command stream thread side, and the
> > main
> > thread work goes from an interlocked op to a simple assignment. Together
> > with the technically unrelated patch 4 it improves a microbenchmark I
> > wrote for this
> > (https://github.com/stefand/perftest/tree/main/resource_tracking_d3d11)
> > from ~200 fps to ~700 fps on my Ryzen CPU. Other CPUs have lower gains,
> > but still more than double the framerate. It also produces a measurable
> > improvement in Rocket League once other known CS issues are hacked away.
> > 
> > Items for discussion:
> > 
> > 1) I am not entirely sure I do the ULONG / LONG handling correctly. I
> > guess we could get away with just keeping everything as signed LONGs, but
> > technically signed int overflow is undefined behavior. Interlocked ops
> > accept LONG * though...
> > 
> > 2) resource_acquire could be renamed to something else
> > 
> > 3) Separate read and write timestamps. This should be easy to add on top
> > of
> > the current code.
> > 
> > 4) Traversing resource->device->cs->queue in wined3d_resource_acquire is
> > ugly. I'm contemplating passing const struct wined3d_cs or the timestamp
> > to it explicitly.
> > 
> > 5) We still iterate over a huge number of resources. Does anyone have
> > ideas
> > how to cut this down?
> > 
> > Happy new Year,
> > Stefan
> > <0001-wined3d-Use-extra-bits-in-the-queue-head-and-tail-co.patch><0002-win
> > ed3d-Don-t-acquire-the-resource-in-update_sub_res.patch><0003-wined3d-Use-
> > the-default-queue-index-for-resource-fen.patch><0004-Move-resource-type-aw
> > ay-from-the-access-time-field.patch><0005-wined3d-Remove-the-no-op-wined3d
> > _resource_release.patch>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20220223/aef6dd97/attachment.sig>


More information about the wine-devel mailing list