[PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().

Thu Jun 10 22:39:35 CDT 2021

On 5/31/21 7:12 PM, Matteo Bruni wrote:
> As usual my reply ended up being unwieldy and confusing. Hopefully
> it's still helpful...
> 
> On Mon, May 31, 2021 at 11:45 PM Zebediah Figura (she/her)
> <zfigura at codeweavers.com> wrote:
>> On 5/31/21 4:53 AM, Henri Verbeet wrote:
>>> Yes, mostly what Matteo said.
>>>
>>> On Fri, 28 May 2021 at 15:39, Matteo Bruni <matteo.mystral at gmail.com> wrote:
>>>> So this is, AFAIR, more or less what the staging patchset did. I don't
>>>> think that we want to put the data in the command stream though.
>>>>
>>> It may not be too bad for deferred contexts, but for immediate
>>> contexts probably not, no.
>>>
>>>> Always copying the data from the user-provided source to some other
>>>> storage is definitely the right choice: we don't want to block the
>>>> application (more or less directly) to make sure that the data stays
>>>> around until we need it. That's bad for performance for immediate
>>>> contexts and just impossible for deferred, so not much to argue there.
>>>> At the same time we also want to minimize the amount of copies.
>>>> Ideally, for a GPU resource, here we'd upload the data right to the
>>>> GPU (e.g. inside a persistently mapped buffer), ready to be used
>>>> whenever it's time.
>>
>> To clarify, when talking about uploading directly to the GPU, are you
>> just referring to map() here, or update_sub_resource() as well?
> 
> Potentially both cases. A DISCARD map is virtually indistinguishable
> from update_sub_resource().
> 
>> Does it even make sense to use a persistently mapped GPU buffer for a
>> resource mapped/updated by a deferred context? Obligatory disclaimer
>> that I don't know anything about GPU-side performance, but obviously you
>> can't reuse an earlier mapping for UpdateSubResource() or for DISCARD
>> maps, and I'm failing to understand why an application would ever use
>> NOOVERWRITE on a deferred context.
> 
> I don't have all the details of deferred contexts down, but I'm pretty
> sure that you can. All that you need to know is that the memory block
> you return to the application is not in use (be it by wined3d or by
> the underlying API / GPU) by the time the application will access it.
> Whether the memory is backed by a buffer object, suballocated from a
> larger GPU-side resource or a random chunk of system memory is an
> implementation detail.

I think there may be some miscommunication here, but I'm talking about 
the application doing something like:

1. map resource with DISCARD
2. draw from it
3. map resource with DISCARD, filling it with different data
4. draw from it

(3) can't return the same memory as (1) on a deferred context, nor can 
we throw away the contents of (1) when we get (3).

If the application never uses NOOVERWRITE or partial maps (and I don't 
know how common that actually is) it comes down to a single blit to a 
GPU resource. I'm guessing that's not really an ideal place to use a 
persistent map.

> NOOVERWRITE also makes perfect sense with deferred contexts. The
> general idea behind DISCARD and NOOVERWRITE is to gradually "fill" a
> larger resource over a number of draws, without requiring any
> synchronization between the draws.
> This is probably largely or entirely obvious but I'll go through the
> whole story to make sure we're on the same page.
> 
> The natural use case goes like this: the application wants to draw
> some dynamic geometry (i.e. generated at runtime, with variable
> primitive count). At initialization time it creates a couple large
> buffers (e.g. 1 MB each), one for vertices and one for indices. Then
> at runtime it maps both with DISCARD, fills them with whatever data is
> necessary for the current draw (e.g. vertex and index data for 100
> triangles), keeping track of how much memory is still available in
> either buffer, unmaps and draws. After a while it decides that it
> needs to draw some more dynamic stuff. Now it does something very
> similar to the previous time, except if there is still some free space
> available it will map the buffer with NOOVERWRITE and "append" to the
> buffer, writing the new data after the area used by the first draw.
> NOOVERWRITE is a promise to d3d to not touch the data in use by any
> previous draw, which in practice means that the implementation can
> return the same memory as the previous map, rather than blocking to
> make sure that any pending draw using the same buffer is completed.
> Eventually the application will run out of space from one buffer or
> the other. No problem, at that point the application will map the
> relevant buffer with DISCARD and restart filling the buffer from the
> top. Here DISCARD is a different promise to d3d: the application tells
> d3d that it won't try to make use of the old contents of the buffer.
> What it implies for the implementation is that an entirely new
> "physical" buffer can be made available to the application, which will
> take the place of the previous "physical" buffer once pending draws
> are done with.
> Afterwards the application will resume mapping the buffer with
> NOOVERWRITE for the following dynamic draws, until the buffer fills up
> again and so on and so forth. Worth mentioning is that the "new"
> buffer that the implementation can return on a DISCARD map might in
> fact be an old buffer, for example coming from a buffer discarded some
> time earlier, currently unused and sitting idle. Or it might be a
> portion of a larger physical buffer that the implementation knows it's
> not in use.
> 
> Of course applications can and often do find ways to use those
> primitives more creatively, but this is the basic idea. Also, this
> works just as well for immediate or for deferred contexts.
> 
>>>> For a CPU resource instead we'd copy the data into
>>>> its final place in sysmem (what will eventually become its
>>>> "heap_memory"). We don't quite have the infrastructure for it at the
>>>> moment but that would be the goal IMO.
>>>> What you do in patch 2/2 is more along the lines of this idea. The
>>>> sysmem allocation mechanism works okay for the time being and could be
>>>> extended to handle other types of memory, e.g. it could eventually
>>>> take charge of the vulkan allocator, making it generic and usable from
>>>> GL too. I don't think it has to be managed by each deferred context
>>>> (i.e. it should probably be per-device), although the current scheme
>>>> has the advantage of not having to care about concurrency.
>>
>> In a sense there is no "current scheme", i.e. deferred contexts are
>> tracking which memory belongs to which resource because that *has* to be
>> done per-context, but nobody's managing custom heaps. Unless I
>> misunderstand what's meant here?
> 
> Why? Memory is memory and is a per-device resource. All you need to do
> is return some available memory to the application whenever it asks
> for a resource map, and to do that you need to keep track of how
> memory is used and for how long.

The other thing a deferred context needs to do is keep track of the last 
bit of memory it returned for each resource, and return that same bit of 
memory in case of a NOOVERWRITE map.

Actually I guess we don't *need* to do that; we could return a new bit 
of memory; but Windows does (which means that applications might depend 
on it), and it strikes me as better for performance...

>>>> So, at least in my view, I'd bring the allocator bits from patch 2/2
>>>> into patch 1/2 and use that for update_sub_resource, both the deferred
>>>> and the immediate variants (the latter could be its own patch). You'd
>>>> only need the DISCARD portion for update_sub_resource, so the rest can
>>>> (and probably should) stay in the map() patch.
>>>>
>>> My (very rough) proposal for this would be to replace the "data" field
>>> in the wined3d_cs_update_sub_resource structure with a
>>> wined3d_bo_address, and then introduce something like the following:
>>>
>>>       static bool wined3d_device_context_get_upload_bo(struct
>>> wined3d_device_context *context,
>>>               struct wined3d_resource *resource, unsigned int
>>> sub_resource_idx, const struct wined3d_box *box,
>>>               uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
>>>       {
>>>           ...
>>>       }
>>
>> This function strikes me as more than a little confusing, party because
>> it seems to be returning two partially overlapping things, with mostly
>> different usages...
>>
>> i.e. I'd presume the idea is for
>> wined3d_device_context_update_sub_resource() to retrieve this and then
>> call something like wined3d_context_copy_bo_address() to copy from
>> sysmem, which makes sense, but then what are we doing with map_ptr?
>>
>> Or, if we use this in wined3d_device_context_map(), couldn't we just
>> call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
> 
> As I understand it, the idea is to return map_ptr to the application
> and keep track of address internally (i.e. it's the reference to the
> "physical resource" associated with the memory returned via map_ptr).
> You'll want to keep track of those struct wined3d_bo_address and in
> particular of when they'll become available again at some point in the
> future.
> 
>>> For deferred contexts you should then be able to do something
>>> (conceptually) very similar to patch 2/2, and in the end we may not
>>> even need map/unmap/update_sub_resource in the
>>> wined3d_device_context_ops structure.
>>>
>> We'd still presumably need an unmap, though, unless the idea above is to
>> *always* allocate new (GPU or CPU) memory, which seems like a much
>> bigger step.
> 
> I'm not entirely on top of the latest changes in wined3d and I'm not
> sure what we want the various _ops structs to look like eventually,
> but AFAIU we do want to always allocate new memory for DISCARD maps
> (and we basically do that already, both for Vulkan and GL, although in
> a somewhat roundabout fashion). As I mentioned earlier, it doesn't
> have to be actually new memory, if we happen to have some available
> memory around we can use it instead.
> I think the point is that maps really need to go through
> wined3d_device_context_ops / the CS only because, sometimes, you need
> new memory. Once you have a call that caters to that situation, you
> should be set. All the other situations can ideally be handled on the
> "client" side, without crossing the CS boundary. There are a few
> complications (what about READ maps from a STAGING resource? updates
> to a non-DYNAMIC buffer?) but those are the special cases, the
> WRITE-only DISCARD / NOOVERWRITE dynamic buffer map is the interesting
> use case that deserves to be streamlined. For deferred contexts, it's
> the only allowed case and that's no coincidence.
> 
>> Unless I'm misreading things, this also seems to imply that
>> *_resource_sub_resource_map() shouldn't exist anymore, right?
> 
> That's a separate question I think, as that's the API the upper
> wined3d "layers" use to access physical resources. It might need to be
> modified along the way to better handle these new concepts but it
> seems likely that we'll still need something in the same fashion
>