[PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().

Mon May 31 19:12:12 CDT 2021

As usual my reply ended up being unwieldy and confusing. Hopefully
it's still helpful...

On Mon, May 31, 2021 at 11:45 PM Zebediah Figura (she/her)
<zfigura at codeweavers.com> wrote:
> On 5/31/21 4:53 AM, Henri Verbeet wrote:
> > Yes, mostly what Matteo said.
> >
> > On Fri, 28 May 2021 at 15:39, Matteo Bruni <matteo.mystral at gmail.com> wrote:
> >> So this is, AFAIR, more or less what the staging patchset did. I don't
> >> think that we want to put the data in the command stream though.
> >>
> > It may not be too bad for deferred contexts, but for immediate
> > contexts probably not, no.
> >
> >> Always copying the data from the user-provided source to some other
> >> storage is definitely the right choice: we don't want to block the
> >> application (more or less directly) to make sure that the data stays
> >> around until we need it. That's bad for performance for immediate
> >> contexts and just impossible for deferred, so not much to argue there.
> >> At the same time we also want to minimize the amount of copies.
> >> Ideally, for a GPU resource, here we'd upload the data right to the
> >> GPU (e.g. inside a persistently mapped buffer), ready to be used
> >> whenever it's time.
>
> To clarify, when talking about uploading directly to the GPU, are you
> just referring to map() here, or update_sub_resource() as well?

Potentially both cases. A DISCARD map is virtually indistinguishable
from update_sub_resource().

> Does it even make sense to use a persistently mapped GPU buffer for a
> resource mapped/updated by a deferred context? Obligatory disclaimer
> that I don't know anything about GPU-side performance, but obviously you
> can't reuse an earlier mapping for UpdateSubResource() or for DISCARD
> maps, and I'm failing to understand why an application would ever use
> NOOVERWRITE on a deferred context.

I don't have all the details of deferred contexts down, but I'm pretty
sure that you can. All that you need to know is that the memory block
you return to the application is not in use (be it by wined3d or by
the underlying API / GPU) by the time the application will access it.
Whether the memory is backed by a buffer object, suballocated from a
larger GPU-side resource or a random chunk of system memory is an
implementation detail.

NOOVERWRITE also makes perfect sense with deferred contexts. The
general idea behind DISCARD and NOOVERWRITE is to gradually "fill" a
larger resource over a number of draws, without requiring any
synchronization between the draws.
This is probably largely or entirely obvious but I'll go through the
whole story to make sure we're on the same page.

The natural use case goes like this: the application wants to draw
some dynamic geometry (i.e. generated at runtime, with variable
primitive count). At initialization time it creates a couple large
buffers (e.g. 1 MB each), one for vertices and one for indices. Then
at runtime it maps both with DISCARD, fills them with whatever data is
necessary for the current draw (e.g. vertex and index data for 100
triangles), keeping track of how much memory is still available in
either buffer, unmaps and draws. After a while it decides that it
needs to draw some more dynamic stuff. Now it does something very
similar to the previous time, except if there is still some free space
available it will map the buffer with NOOVERWRITE and "append" to the
buffer, writing the new data after the area used by the first draw.
NOOVERWRITE is a promise to d3d to not touch the data in use by any
previous draw, which in practice means that the implementation can
return the same memory as the previous map, rather than blocking to
make sure that any pending draw using the same buffer is completed.
Eventually the application will run out of space from one buffer or
the other. No problem, at that point the application will map the
relevant buffer with DISCARD and restart filling the buffer from the
top. Here DISCARD is a different promise to d3d: the application tells
d3d that it won't try to make use of the old contents of the buffer.
What it implies for the implementation is that an entirely new
"physical" buffer can be made available to the application, which will
take the place of the previous "physical" buffer once pending draws
are done with.
Afterwards the application will resume mapping the buffer with
NOOVERWRITE for the following dynamic draws, until the buffer fills up
again and so on and so forth. Worth mentioning is that the "new"
buffer that the implementation can return on a DISCARD map might in
fact be an old buffer, for example coming from a buffer discarded some
time earlier, currently unused and sitting idle. Or it might be a
portion of a larger physical buffer that the implementation knows it's
not in use.

Of course applications can and often do find ways to use those
primitives more creatively, but this is the basic idea. Also, this
works just as well for immediate or for deferred contexts.

> >> For a CPU resource instead we'd copy the data into
> >> its final place in sysmem (what will eventually become its
> >> "heap_memory"). We don't quite have the infrastructure for it at the
> >> moment but that would be the goal IMO.
> >> What you do in patch 2/2 is more along the lines of this idea. The
> >> sysmem allocation mechanism works okay for the time being and could be
> >> extended to handle other types of memory, e.g. it could eventually
> >> take charge of the vulkan allocator, making it generic and usable from
> >> GL too. I don't think it has to be managed by each deferred context
> >> (i.e. it should probably be per-device), although the current scheme
> >> has the advantage of not having to care about concurrency.
>
> In a sense there is no "current scheme", i.e. deferred contexts are
> tracking which memory belongs to which resource because that *has* to be
> done per-context, but nobody's managing custom heaps. Unless I
> misunderstand what's meant here?

Why? Memory is memory and is a per-device resource. All you need to do
is return some available memory to the application whenever it asks
for a resource map, and to do that you need to keep track of how
memory is used and for how long.

> >> So, at least in my view, I'd bring the allocator bits from patch 2/2
> >> into patch 1/2 and use that for update_sub_resource, both the deferred
> >> and the immediate variants (the latter could be its own patch). You'd
> >> only need the DISCARD portion for update_sub_resource, so the rest can
> >> (and probably should) stay in the map() patch.
> >>
> > My (very rough) proposal for this would be to replace the "data" field
> > in the wined3d_cs_update_sub_resource structure with a
> > wined3d_bo_address, and then introduce something like the following:
> >
> >      static bool wined3d_device_context_get_upload_bo(struct
> > wined3d_device_context *context,
> >              struct wined3d_resource *resource, unsigned int
> > sub_resource_idx, const struct wined3d_box *box,
> >              uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
> >      {
> >          ...
> >      }
>
> This function strikes me as more than a little confusing, party because
> it seems to be returning two partially overlapping things, with mostly
> different usages...
>
> i.e. I'd presume the idea is for
> wined3d_device_context_update_sub_resource() to retrieve this and then
> call something like wined3d_context_copy_bo_address() to copy from
> sysmem, which makes sense, but then what are we doing with map_ptr?
>
> Or, if we use this in wined3d_device_context_map(), couldn't we just
> call wined3d_context_map_bo_address() on the returned wined3d_bo_address?

As I understand it, the idea is to return map_ptr to the application
and keep track of address internally (i.e. it's the reference to the
"physical resource" associated with the memory returned via map_ptr).
You'll want to keep track of those struct wined3d_bo_address and in
particular of when they'll become available again at some point in the
future.

> > For deferred contexts you should then be able to do something
> > (conceptually) very similar to patch 2/2, and in the end we may not
> > even need map/unmap/update_sub_resource in the
> > wined3d_device_context_ops structure.
> >
> We'd still presumably need an unmap, though, unless the idea above is to
> *always* allocate new (GPU or CPU) memory, which seems like a much
> bigger step.

I'm not entirely on top of the latest changes in wined3d and I'm not
sure what we want the various _ops structs to look like eventually,
but AFAIU we do want to always allocate new memory for DISCARD maps
(and we basically do that already, both for Vulkan and GL, although in
a somewhat roundabout fashion). As I mentioned earlier, it doesn't
have to be actually new memory, if we happen to have some available
memory around we can use it instead.
I think the point is that maps really need to go through
wined3d_device_context_ops / the CS only because, sometimes, you need
new memory. Once you have a call that caters to that situation, you
should be set. All the other situations can ideally be handled on the
"client" side, without crossing the CS boundary. There are a few
complications (what about READ maps from a STAGING resource? updates
to a non-DYNAMIC buffer?) but those are the special cases, the
WRITE-only DISCARD / NOOVERWRITE dynamic buffer map is the interesting
use case that deserves to be streamlined. For deferred contexts, it's
the only allowed case and that's no coincidence.

> Unless I'm misreading things, this also seems to imply that
> *_resource_sub_resource_map() shouldn't exist anymore, right?

That's a separate question I think, as that's the API the upper
wined3d "layers" use to access physical resources. It might need to be
modified along the way to better handle these new concepts but it
seems likely that we'll still need something in the same fashion