Replying to both mails at once, sort of...
On 5/31/21 4:53 AM, Henri Verbeet wrote:
Yes, mostly what Matteo said.
On Fri, 28 May 2021 at 15:39, Matteo Bruni <matteo.mystral(a)gmail.com> wrote:
So this is, AFAIR, more or less what the staging
patchset did. I don't
think that we want to put the data in the command stream though.
It may not be too bad for deferred contexts, but for immediate
contexts probably not, no.
> Always copying the data from the user-provided source to some other
> storage is definitely the right choice: we don't want to block the
> application (more or less directly) to make sure that the data stays
> around until we need it. That's bad for performance for immediate
> contexts and just impossible for deferred, so not much to argue there.
> At the same time we also want to minimize the amount of copies.
> Ideally, for a GPU resource, here we'd upload the data right to the
> GPU (e.g. inside a persistently mapped buffer), ready to be used
> whenever it's time.
To clarify, when talking about uploading directly to the GPU, are you
just referring to map() here, or update_sub_resource() as well?
Does it even make sense to use a persistently mapped GPU buffer for a
resource mapped/updated by a deferred context? Obligatory disclaimer
that I don't know anything about GPU-side performance, but obviously you
can't reuse an earlier mapping for UpdateSubResource() or for DISCARD
maps, and I'm failing to understand why an application would ever use
NOOVERWRITE on a deferred context.
> For a CPU resource instead we'd copy the data
into
> its final place in sysmem (what will eventually become its
> "heap_memory"). We don't quite have the infrastructure for it at the
> moment but that would be the goal IMO.
> What you do in patch 2/2 is more along the lines of this idea. The
> sysmem allocation mechanism works okay for the time being and could be
> extended to handle other types of memory, e.g. it could eventually
> take charge of the vulkan allocator, making it generic and usable from
> GL too. I don't think it has to be managed by each deferred context
> (i.e. it should probably be per-device), although the current scheme
> has the advantage of not having to care about concurrency.
In a sense there is no "current scheme", i.e. deferred contexts are
tracking which memory belongs to which resource because that *has* to be
done per-context, but nobody's managing custom heaps. Unless I
misunderstand what's meant here?
So, at least
in my view, I'd bring the allocator bits from patch 2/2
into patch 1/2 and use that for update_sub_resource, both the deferred
and the immediate variants (the latter could be its own patch). You'd
only need the DISCARD portion for update_sub_resource, so the rest can
(and probably should) stay in the map() patch.
My (very rough) proposal for this would be to replace the "data" field
in the wined3d_cs_update_sub_resource structure with a
wined3d_bo_address, and then introduce something like the following:
static bool wined3d_device_context_get_upload_bo(struct
wined3d_device_context *context,
struct wined3d_resource *resource, unsigned int
sub_resource_idx, const struct wined3d_box *box,
uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
{
...
}
This function strikes me as more than a little confusing, party because
it seems to be returning two partially overlapping things, with mostly
different usages...
i.e. I'd presume the idea is for
wined3d_device_context_update_sub_resource() to retrieve this and then
call something like wined3d_context_copy_bo_address() to copy from
sysmem, which makes sense, but then what are we doing with map_ptr?
Or, if we use this in wined3d_device_context_map(), couldn't we just
call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
For deferred contexts you should then be able to do something
(conceptually) very similar to patch 2/2, and in the end we may not
even need map/unmap/update_sub_resource in the
wined3d_device_context_ops structure.
We'd still presumably need an unmap, though, unless the idea above is to
*always* allocate new (GPU or CPU) memory, which seems like a much
bigger step.
Unless I'm misreading things, this also seems to imply that
*_resource_sub_resource_map() shouldn't exist anymore, right?