[PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().

Fri May 28 08:39:18 CDT 2021

On Fri, May 28, 2021 at 7:22 AM Zebediah Figura <z.figura12 at gmail.com> wrote:
>
> Signed-off-by: Zebediah Figura <z.figura12 at gmail.com>
> ---
>  dlls/wined3d/cs.c | 37 +++++++++++++++++++++++++++++++++----
>  1 file changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/dlls/wined3d/cs.c b/dlls/wined3d/cs.c
> index 50a08334dab..e6f84134795 100644
> --- a/dlls/wined3d/cs.c
> +++ b/dlls/wined3d/cs.c
> @@ -490,6 +490,7 @@ struct wined3d_cs_update_sub_resource
>      unsigned int sub_resource_idx;
>      struct wined3d_box box;
>      struct wined3d_sub_resource_data data;
> +    /* If data.data is NULL, the structure is followed by the data in memory. */
>  };
>
>  struct wined3d_cs_add_dirty_texture_region
> @@ -2597,6 +2598,7 @@ void wined3d_device_context_emit_blt_sub_resource(struct wined3d_device_context
>  static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const void *data)
>  {
>      const struct wined3d_cs_update_sub_resource *op = data;
> +    const void *src_data = op->data.data ? op->data.data : (const BYTE *)(op + 1);
>      struct wined3d_resource *resource = op->resource;
>      const struct wined3d_box *box = &op->box;
>      unsigned int width, height, depth, level;
> @@ -2617,7 +2619,7 @@ static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const voi
>              goto done;
>          }
>
> -        wined3d_buffer_upload_data(buffer, context, box, op->data.data);
> +        wined3d_buffer_upload_data(buffer, context, box, src_data);
>          wined3d_buffer_invalidate_location(buffer, ~WINED3D_LOCATION_BUFFER);
>          goto done;
>      }
> @@ -2630,7 +2632,7 @@ static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const voi
>      depth = wined3d_texture_get_level_depth(texture, level);
>
>      addr.buffer_object = 0;
> -    addr.addr = op->data.data;
> +    addr.addr = src_data;
>
>      /* Only load the sub-resource for partial updates. */
>      if (!box->left && !box->top && !box->front
> @@ -3375,8 +3377,35 @@ static void wined3d_deferred_context_update_sub_resource(struct wined3d_device_c
>          struct wined3d_resource *resource, unsigned int sub_resource_idx, const struct wined3d_box *box,
>          const void *data, unsigned int row_pitch, unsigned int slice_pitch)
>  {
> -    FIXME("context %p, resource %p, sub_resource_idx %u, box %s, data %p, row_pitch %u, slice_pitch %u, stub!\n",
> -            context, resource, sub_resource_idx, debug_box(box), data, row_pitch, slice_pitch);
> +    struct wined3d_cs_update_sub_resource *op;
> +    size_t data_size;
> +
> +    if (resource->type == WINED3D_RTYPE_BUFFER)
> +    {
> +        data_size = box->right - box->left;
> +    }
> +    else
> +    {
> +        const struct wined3d_format *format = resource->format;
> +
> +        data_size = (box->back - box->front - 1) * slice_pitch
> +                + ((box->bottom - box->top - 1) / format->block_height) * row_pitch
> +                + ((box->right - box->left + format->block_width - 1) / format->block_width) * format->block_byte_count;
> +    }
> +
> +    op = wined3d_device_context_require_space(context, sizeof(*op) + data_size, WINED3D_CS_QUEUE_DEFAULT);
> +    op->opcode = WINED3D_CS_OP_UPDATE_SUB_RESOURCE;
> +    op->resource = resource;
> +    op->sub_resource_idx = sub_resource_idx;
> +    op->box = *box;
> +    op->data.row_pitch = row_pitch;
> +    op->data.slice_pitch = slice_pitch;
> +    op->data.data = NULL;
> +    memcpy(op + 1, data, data_size);
> +
> +    wined3d_device_context_acquire_resource(context, resource);
> +
> +    wined3d_device_context_submit(context, WINED3D_CS_QUEUE_DEFAULT);
>  }

So this is, AFAIR, more or less what the staging patchset did. I don't
think that we want to put the data in the command stream though.

Always copying the data from the user-provided source to some other
storage is definitely the right choice: we don't want to block the
application (more or less directly) to make sure that the data stays
around until we need it. That's bad for performance for immediate
contexts and just impossible for deferred, so not much to argue there.
At the same time we also want to minimize the amount of copies.
Ideally, for a GPU resource, here we'd upload the data right to the
GPU (e.g. inside a persistently mapped buffer), ready to be used
whenever it's time. For a CPU resource instead we'd copy the data into
its final place in sysmem (what will eventually become its
"heap_memory"). We don't quite have the infrastructure for it at the
moment but that would be the goal IMO.
What you do in patch 2/2 is more along the lines of this idea. The
sysmem allocation mechanism works okay for the time being and could be
extended to handle other types of memory, e.g. it could eventually
take charge of the vulkan allocator, making it generic and usable from
GL too. I don't think it has to be managed by each deferred context
(i.e. it should probably be per-device), although the current scheme
has the advantage of not having to care about concurrency.

So, at least in my view, I'd bring the allocator bits from patch 2/2
into patch 1/2 and use that for update_sub_resource, both the deferred
and the immediate variants (the latter could be its own patch). You'd
only need the DISCARD portion for update_sub_resource, so the rest can
(and probably should) stay in the map() patch.