[PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().

Mon May 31 16:45:39 CDT 2021

Replying to both mails at once, sort of...

On 5/31/21 4:53 AM, Henri Verbeet wrote:
> Yes, mostly what Matteo said.
> 
> On Fri, 28 May 2021 at 15:39, Matteo Bruni <matteo.mystral at gmail.com> wrote:
>> So this is, AFAIR, more or less what the staging patchset did. I don't
>> think that we want to put the data in the command stream though.
>>
> It may not be too bad for deferred contexts, but for immediate
> contexts probably not, no.
> 
>> Always copying the data from the user-provided source to some other
>> storage is definitely the right choice: we don't want to block the
>> application (more or less directly) to make sure that the data stays
>> around until we need it. That's bad for performance for immediate
>> contexts and just impossible for deferred, so not much to argue there.
>> At the same time we also want to minimize the amount of copies.
>> Ideally, for a GPU resource, here we'd upload the data right to the
>> GPU (e.g. inside a persistently mapped buffer), ready to be used
>> whenever it's time. 

To clarify, when talking about uploading directly to the GPU, are you 
just referring to map() here, or update_sub_resource() as well?

Does it even make sense to use a persistently mapped GPU buffer for a 
resource mapped/updated by a deferred context? Obligatory disclaimer 
that I don't know anything about GPU-side performance, but obviously you 
can't reuse an earlier mapping for UpdateSubResource() or for DISCARD 
maps, and I'm failing to understand why an application would ever use 
NOOVERWRITE on a deferred context.

>> For a CPU resource instead we'd copy the data into
>> its final place in sysmem (what will eventually become its
>> "heap_memory"). We don't quite have the infrastructure for it at the
>> moment but that would be the goal IMO.
>> What you do in patch 2/2 is more along the lines of this idea. The
>> sysmem allocation mechanism works okay for the time being and could be
>> extended to handle other types of memory, e.g. it could eventually
>> take charge of the vulkan allocator, making it generic and usable from
>> GL too. I don't think it has to be managed by each deferred context
>> (i.e. it should probably be per-device), although the current scheme
>> has the advantage of not having to care about concurrency.

In a sense there is no "current scheme", i.e. deferred contexts are 
tracking which memory belongs to which resource because that *has* to be 
done per-context, but nobody's managing custom heaps. Unless I 
misunderstand what's meant here?

>> So, at least in my view, I'd bring the allocator bits from patch 2/2
>> into patch 1/2 and use that for update_sub_resource, both the deferred
>> and the immediate variants (the latter could be its own patch). You'd
>> only need the DISCARD portion for update_sub_resource, so the rest can
>> (and probably should) stay in the map() patch.
>>
> My (very rough) proposal for this would be to replace the "data" field
> in the wined3d_cs_update_sub_resource structure with a
> wined3d_bo_address, and then introduce something like the following:
> 
>      static bool wined3d_device_context_get_upload_bo(struct
> wined3d_device_context *context,
>              struct wined3d_resource *resource, unsigned int
> sub_resource_idx, const struct wined3d_box *box,
>              uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
>      {
>          ...
>      }

This function strikes me as more than a little confusing, party because 
it seems to be returning two partially overlapping things, with mostly 
different usages...

i.e. I'd presume the idea is for 
wined3d_device_context_update_sub_resource() to retrieve this and then 
call something like wined3d_context_copy_bo_address() to copy from 
sysmem, which makes sense, but then what are we doing with map_ptr?

Or, if we use this in wined3d_device_context_map(), couldn't we just 
call wined3d_context_map_bo_address() on the returned wined3d_bo_address?

> 
> For deferred contexts you should then be able to do something
> (conceptually) very similar to patch 2/2, and in the end we may not
> even need map/unmap/update_sub_resource in the
> wined3d_device_context_ops structure.
> 
We'd still presumably need an unmap, though, unless the idea above is to 
*always* allocate new (GPU or CPU) memory, which seems like a much 
bigger step.

Unless I'm misreading things, this also seems to imply that 
*_resource_sub_resource_map() shouldn't exist anymore, right?