[PATCH 0/5] MR197: winegstreamer: Add support for zero-copy in wg_transform.

Wed Jun 8 03:37:11 CDT 2022

> >``` @@ -116,8 +160,15 @@ static GstMemory *wg_allocator_alloc(GstAllocator *gst_allocator, gsize size,
> >       memory->unix_memory = gst_allocator_alloc(NULL, size, params);
> >       gst_memory_map(memory->unix_memory, &memory->unix_map_info, GST_MAP_WRITE);
> >   
> > -    GST_INFO("Allocated memory %p, unix_memory %p, data %p", memory, memory->unix_memory,
> > -            memory->unix_map_info.data);
> > +    pthread_mutex_lock(&allocator->mutex);
> > +
> > +    memory->sample = allocator->request_sample(size, allocator->request_sample_context);
> > +    list_add_tail(&allocator->memory_list, &memory->entry);
> > +
> > +    pthread_mutex_unlock(&allocator->mutex);
> > +
> > +    GST_INFO("Allocated memory %p, sample %p, unix_memory %p, data %p", memory,
> > +            memory->sample, memory->unix_memory, memory->unix_map_info.data);
> >       return (GstMemory *)memory;
> >   }
> >```
> 
> Could we do this without the callback? Just thinking out loud, maybe 
> something like
> 
> void wg_allocator_add_sample(WgAllocator *allocator,
>          struct wg_sample *sample);
> void wg_allocator_remove_sample(WgAllocator *allocator,
>          struct wg_sample *sample);
> 
> which manipulate an internal wg_sample pointer, instead of having them 
> access the wg_transform's sample pointer. Then 
> wg_allocator_remove_sample() would also end up calling 
> wg_allocator_release_sample().
> 
> I suspect I'm missing something, though, given my below comment...

The callback is with a future use from wg_parser in mind. In that case
we would forward allocation requests to the read thread (for instance).

> >```       if (!(sample->flags & WG_SAMPLE_FLAG_INCOMPLETE))
> >       {
> > +        /* Taint the buffer memory to make sure it cannot be reused by the buffer pool,
> > +         * for the pool to always requests new memory from the allocator, and so we can
> > +         * then always provide output sample memory to achieve zero-copy.
> > +         *
> > +         * However, some decoder keep a reference on the buffer they passed downstream,
> > +         * to re-use it later. In this case, it will not be possible to do zero-copy,
> > +         * and we should copy the data back to the buffer and leave it unchanged.
> > +         *
> > +         * Some other plugins make assumptions that the returned buffer will always have
> > +         * at least one memory attached, we cannot just remove it and need to replace the
> > +         * memory instead.
> > +         */
> > +        if ((discard_data = gst_buffer_is_writable(output_buffer)))
> > +            gst_buffer_replace_all_memory(output_buffer, gst_allocator_alloc(NULL, 0, NULL));
> > +
> >           gst_sample_unref(transform->output_sample);
> >           transform->output_sample = NULL;
> >       }
> >```
> 
> Ah, took me a minute to understand this. To make sure I've got it right: 
> this isn't a correctness issue (i.e. the patch would be fine without it, 
> if less efficient), but the point is that if we keep our newly allocated 
> GstMemoryâ€”which no longer has a wg_sample attachedâ€”in circulation, the 
> buffer pool will continue to use it instead of requesting a new sample, 
> which means that samples after the first won't be zero-copied, and to 
> avoid this, we fill the GstBuffer with a useless empty GstMemory object. 
> Is that right?

Yes. It'd be better to use gst_buffer_remove_all memory, but VA-API
plugins makes some incorrect assumptions and it causes a CRITICAL
message.

> Had to look this up, but apparently the default GstBufferPool will throw 
> away buffers if they're modified. I thought the "empty" buffers would 
> pile up and never be freed, but evidently not.
> 
> I've probably had this question answered before, but can we replace the 
> buffer pool itself, instead of just the allocator?

We could reimplement a custom buffer pool that always free buffers, but
that would be more work and still won't solve the underlying problem.

The problem the allocator is solving is that some decoders don't release
their buffers to the pool. The gst_buffer_is_writable check catches that
and gst_buffer_replace_all_memory would fail anyway if we're not the
only ones holding a ref on the buffer. So instead of discarding the data
we copy it back into the buffer unix memory.

Later, when the decoder will eventually release the buffer to the pool,
it will be untainted, and so not freed until the pool is destroyed, and
reusable without going through the allocator. After a few iterations the
pool has enough allocated buffers, we won't get any allocation requests
anymore and our samples are never used directly and we copy the data
normally.

-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/197#note_1693