[PATCH 3/3] wined3d: Handle NOOVERWRITE maps on persistently mapped Vulkan buffers from the client thread.

Sun Oct 10 22:53:23 CDT 2021

On 10/8/21 11:24 AM, Henri Verbeet wrote:
> On Thu, 7 Oct 2021 at 04:22, Zebediah Figura <zfigura at codeweavers.com> wrote:
>> +static void adapter_vk_flush_bo_address(struct wined3d_context *context,
>> +        const struct wined3d_const_bo_address *data, size_t size)
>> +{
>> +    struct wined3d_context_vk *context_vk = wined3d_context_vk(context);
>> +    const struct wined3d_vk_info *vk_info;
>> +    struct wined3d_device_vk *device_vk;
>> +    VkMappedMemoryRange range;
>> +    struct wined3d_bo_vk *bo;
>> +
>> +    if (!(bo = (struct wined3d_bo_vk *)data->buffer_object))
>> +        return;
>> +
>> +    vk_info = context_vk->vk_info;
>> +    device_vk = wined3d_device_vk(context->device);
>> +
>> +    range.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
>> +    range.pNext = NULL;
>> +    range.memory = bo->vk_memory;
>> +    range.offset = bo->memory_offset + (uintptr_t)data->addr;
>> +    range.size = size;
>> +    VK_CALL(vkFlushMappedMemoryRanges(device_vk->vk_device, 1, &range));
>> +}
>> +
> That has a fair amount of overlap with adapter_vk_unmap_bo_address().
> Would a common helper make sense?

Indeed, although the common helper may end up just being 
adapter_vk_flush_bo_address()...

> 
>> +static void *adapter_vk_alloc_upload_bo(struct wined3d_device *device, struct wined3d_resource *resource,
>> +        unsigned int sub_resource_idx, const struct wined3d_box *box, unsigned int *row_pitch,
>> +        unsigned int *slice_pitch, uint32_t flags, struct upload_bo *upload_bo)
>> +{
>> +    wined3d_not_from_cs(device->cs);
>> +
>> +    /* Limit NOOVERWRITE maps to buffers for now; there are too many ways that
>> +     * a texture can be invalidated to even count. */
>> +    if (wined3d_map_persistent() && resource->type == WINED3D_RTYPE_BUFFER && (flags & WINED3D_MAP_NOOVERWRITE))
>> +    {
>> +        struct wined3d_client_sub_resource *sub_resource;
>> +        const struct wined3d_bo_vk *bo_vk;
>> +        uint8_t *map_ptr;
>> +
>> +        sub_resource = wined3d_resource_get_client_sub_resource(resource, sub_resource_idx);
>> +
>> +        bo_vk = (const struct wined3d_bo_vk *)sub_resource->addr.buffer_object;
>> +        map_ptr = bo_vk ? bo_vk->map_ptr : NULL;
>> +        map_ptr += (uintptr_t)sub_resource->addr.addr;
>> +
>> +        if (!map_ptr)
>> +        {
>> +            TRACE("Sub-resource is not persistently mapped.\n");
>> +            return NULL;
>> +        }
>> +
>> +        wined3d_resource_get_sub_resource_map_pitch(resource, sub_resource_idx, row_pitch, slice_pitch);
>> +
>> +        upload_bo->addr = *wined3d_const_bo_address(&sub_resource->addr);
>> +        upload_bo->flags = 0;
>> +        if (bo_vk)
>> +            map_ptr += bo_vk->memory_offset;
>> +        map_ptr = resource_offset_map_pointer(resource, sub_resource_idx, map_ptr, box);
>> +
>> +        if (!(bo_vk->memory_type & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT))
>> +            upload_bo->flags |= UPLOAD_BO_FLUSH_ON_UNMAP;
>> +
>> +        return map_ptr;
>> +    }
>> +
>> +    return NULL;
>> +}
> 
> Do we need adapter_vk_alloc_upload_bo() as such? I suppose it would
> make sense to expose something like wined3d_context_vk_create_bo()
> through the adapter ops, but the "upload bo" mechanism seems more
> generic than that. In its current form, this also doesn't allocate all
> that much, and doesn't seem that all that specific to the Vulkan
> backend. We use the wined3d_bo_vk structure, but the "map_ptr" and
> "memory_offset" would apply just as well to a GL equivalent. I.e., if
> we introduced a common wined3d_bo structure, this could just be in
> wined3d_cs_map_upload_bo().

We don't need it yet, in this patch. We will need it later (or at least, 
it looks harder to make accelerated discard generic), but I believe we 
should be able to pull out the NOOVERWRITE parts, and, as you say, make 
a function called "alloc" really always allocate something.

> 
>> +static void invalidate_persistent_map(struct wined3d_resource *resource, unsigned int sub_resource_idx)
>> +{
>> +    struct wined3d_client_sub_resource *sub_resource;
>> +
>> +    sub_resource = wined3d_resource_get_client_sub_resource(resource, sub_resource_idx);
>> +    memset(&sub_resource->addr, 0, sizeof(sub_resource->addr));
>> +}
>> +
> Is this function correctly named? It doesn't quite do what I'd expect
> something called invalidate_persistent_map() to do.
> 

I suppose your expectation is that it invalidates the map contents, 
which it does not, but rather it invalidates the map pointer. I don't 
know how better to name it to clarify that difference. (Shove a _ptr 
onto the end, maybe?)

Essentially it means "the CS thread might reallocate the BO behind our 
back, thereby invalidating bo->map_ptr, which means we can't treat this 
one as canonical anymore". It's definitely awkward, but I don't see a 
better way to achieve this...

>> @@ -2511,6 +2539,9 @@ HRESULT wined3d_device_context_emit_map(struct wined3d_device_context *context,
>>
>>       wined3d_resource_wait_idle(resource);
>>
>> +    /* We might end up loading the resource into sysmem. */
>> +    invalidate_persistent_map(resource, sub_resource_idx);
>> +
> Sure, we might end up loading the resource into sysmem. That doesn't
> necessarily explain why we would need to call
> invalidate_persistent_map() though. (I.e., the issue is with
> invalidating the BUFFER location, not so much with loading into the
> SYSMEM location.)
> 

Right, I'll try to clarify that comment.

>> @@ -2554,7 +2585,7 @@ HRESULT wined3d_device_context_emit_unmap(struct wined3d_device_context *context
>>           unsigned int row_pitch, slice_pitch;
>>
>>           wined3d_resource_get_sub_resource_map_pitch(resource, sub_resource_idx, &row_pitch, &slice_pitch);
>> -        if (bo.flags & UPLOAD_BO_UPLOAD_ON_UNMAP)
>> +        if (bo.flags & (UPLOAD_BO_UPLOAD_ON_UNMAP | UPLOAD_BO_FLUSH_ON_UNMAP))
>>               wined3d_device_context_upload_bo(context, resource, sub_resource_idx, &box, &bo, row_pitch, slice_pitch);
>>           return WINED3D_OK;
>>       }
> Uploading when FLUSH_ON_UNMAP is set seems odd. I.e., should there
> ever be a case where FLUSH_ON_UNMAP is set, UPLOAD_ON_UNMAP is not
> set, and we still want to call wined3d_device_context_upload_bo()?
> 

As it stands this patch never has both FLUSH_ON_UNMAP and 
UPLOAD_ON_UNMAP set at the same time. I essentially was thinking of them 
as different things to do on an unmap. I guess in that sense "UPLOAD" 
should really have been "COPY" or "BLIT" or something.

>> @@ -2581,8 +2612,7 @@ static void wined3d_cs_exec_blt_sub_resource(struct wined3d_cs *cs, const void *
>>       if (op->dst_resource->type == WINED3D_RTYPE_BUFFER)
>>       {
>>           wined3d_buffer_copy(buffer_from_resource(op->dst_resource), op->dst_box.left,
>> -                buffer_from_resource(op->src_resource), op->src_box.left,
>> -                op->src_box.right - op->src_box.left);
>> +                buffer_from_resource(op->src_resource), op->src_box.left, op->src_box.right - op->src_box.left);
>>       }
> That's a whitespace change. The change is not particularly wrong, but
> it doesn't seem particularly relevant to the rest of this patch
> either.

Oops, must have been left over from something else...

> 
>> +    /* This might result in an implicit discard. */
>> +    if (dst_resource->type == WINED3D_RTYPE_BUFFER && dst_box->right - dst_box->left == dst_resource->size)
>> +        invalidate_persistent_map(dst_resource, 0);
>> +
> Somewhat like the other invalidate_persistent_map() comment above, it
> would require someone to already be fairly familiar with the code in
> question to understand the comment. (I.e., the issue here is that if
> the CS thread does end up doing a DISCARD for this operation, any
> subsequent operations would need to reference the new bo instead of
> the current one.)
> 
>> @@ -2727,8 +2761,22 @@ static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const voi
>>       if (resource->type == WINED3D_RTYPE_BUFFER)
>>       {
>>           struct wined3d_buffer *buffer = buffer_from_resource(resource);
>> +        size_t size = box->right - box->left;
>>
>> -        wined3d_buffer_copy_bo_address(buffer, context, box->left, &op->bo.addr, box->right - box->left);
>> +        if (op->bo.flags & UPLOAD_BO_FLUSH_ON_UNMAP)
>> +        {
>> +            wined3d_context_flush_bo_address(context, &op->bo.addr, size);
>> +        }
>> +        else
>> +        {
>> +            wined3d_buffer_copy_bo_address(buffer, context, box->left, &op->bo.addr, size);
>> +        }
>> +        goto done;
>> +    }
>> +
> It seems like we're mostly using FLUSH_ON_UNMAP here to determine
> whether "op->bo.addr" refers to the same GPU memory as
> "buffer->buffer_object". Could we perhaps just compare them?

I think that works.

> 
>>   static bool wined3d_cs_unmap_upload_bo(struct wined3d_device_context *context, struct wined3d_resource *resource,
>>           unsigned int sub_resource_idx, struct wined3d_box *box, struct upload_bo *bo)
>>   {
>> +    struct wined3d_client_sub_resource *sub_resource;
>> +
>> +    sub_resource = wined3d_resource_get_client_sub_resource(resource, sub_resource_idx);
>> +    if (sub_resource->upload_bo.addr.buffer_object || sub_resource->upload_bo.addr.addr)
>> +    {
> I guess that works; could perhaps do with a helper.

Sure.

> 
>> +        *bo = sub_resource->upload_bo;
>> +        *box = sub_resource->upload_box;
>> +        memset(&sub_resource->upload_bo, 0, sizeof(sub_resource->upload_bo));
>> +        memset(&sub_resource->upload_box, 0, sizeof(sub_resource->upload_box));
>> +        return true;
>> +    }
>> +
>>       return false;
>>   }
>>
> We might as well just return if the condition is false, and save some
> indentation.
> 
>> +struct wined3d_client_sub_resource
>> +{
>> +    struct wined3d_bo_address addr;
>> +
>> +    struct upload_bo upload_bo;
>> +    struct wined3d_box upload_box;
>> +};
>> +
> I'm tempted to think we want to track this per resource instead of per
> sub-resource. For buffers it doesn't really matter though, and for
> textures we don't implement it yet. Perhaps we should just leave
> textures alone for now? That would also avoid
> wined3d_resource_get_client_sub_resource() for the moment.
> 

Indeed.

Although I've found already that some games could really do with 
accelerated texture discard :-/