[PATCH 1/2] winegstreamer: Add helper for GstCaps <-> IMFMediaType conversion.

Thu Mar 26 20:07:24 CDT 2020

On 3/26/20 6:07 PM, Derek Lesho wrote:
> On 3/26/20 4:56 PM, Zebediah Figura wrote:
> 
>> There's another broad question I have with this approach, actually,
>> which is fundamental enough I have to assume it's at had some thought
>> put into it, but it would be nice if that discussion happened in a more
>> public place, and was justified in the patches sent.
>>
>> Essentially, the question is: what if we were to use decodebin directly?
>>
>> As I understand (and admittedly Media Foundation is far more complex
>> than I could hope to understand) an application which just calls
>> IMFSourceResolver methods just needs to get back a working
>> IMFMediaSource, and we could wrap decodebin with one of those, similar
>> to the quartz wrapper.
>>
>> First of all, this is something I think we want to do anyway. Microsoft
>> has no demuxer for, say, Vorbis (at least, there's not one registered on
>> my Windows 10 machine), but I think that we want to be able to play back
>> Vorbis files anyway (in, say, a Win32 media player application). Instead
>> of writing yet another source for vorbis, and for each other obscure
>> format, we just write one generic decodebin wrapper.
>>
>> Second of all, the most obvious benefit, at least while looking at these
>> patches, is that you now don't need to write caps <-> IMFMediaType
>> conversion for every type on the planet. Another benefit is that you let
>> all of the decoding happen within a single GStreamer pipeline, which is
>> probably better for performance. You also can simplify your
>> postprocessing step to adding a single videoconvert and audioconvert,
>> instead of having to manually (or semi-manually) add e.g. an h264 parser
>> element. These are some of the benefits I had in mind when removing the
>> GStreamer quartz transforms.
>>
>> Even in the case where the application manually creates e.g. an MPEG-4
>> source, my understanding is it's still the source's job to automatically
>> append transforms to match the requested type. We'd just be moving that
>> from the mfplat level to the gstreamer level—i.e. let decodebin select
>> the 'transforms' needed to convert to raw video and audio.
>>
>> It obviously wouldn't match native structure, but it's not clear to me
>> that it would fail to match native in a way that would cause problems.
>> Judging from my experience with quartz, most applications aren't going
>> to care how their media is decoded as long as they get raw samples out
>> of it. Only a select few build the graph manually because they don't
>> realize that they can autoplug, or make assumptions about which filters
>> will be present once autoplugging is done, and some of those even fall
>> back to autoplugging if their preferred method fails. Maybe the
>> situation is different with mfplat, but given that there is a way to let
>> mfplat figure out which sources and transforms to use, I'm gonna be
>> really surprised if most applications aren't using it.
>>
>> If you do come across an application that requires we mimic native's
>> specific arrangement of sources and transforms, it seems to me it
>> wouldn't require that much effort to swap a different parser in for
>> decodebin, and to implement the necessary bits in the media type
>> conversion functions. Ultimately I suspect it'd be less work to have a
>> decodebin wrapper + specific sources for applications that require them,
>> than to manually implement every source and transform.
> I'll make a more complete response to this tomorrow, but I really think 
> that doing the incorrect thing isn't worth the supposed simplicity your 
> method brings.  For instance, a commit I have on my local branch adding 
> a ASF source and WMV decoder is 126 lines long. Take a look: 
> https://github.com/Guy1524/wine/commit/37748e69bb25f3bf97f4dbfebaa830e3eb282dda 
> 

While I await your more complete response, I figure I might as well
clarify some things.

I don't think that "doing the incorrect thing", i.e. failing to exactly
emulate Windows, should necessarily be considered bad in itself, or at
least not nearly as bad as all that.

My view, and my understanding of the Wine project's view in general as
informed by its maintainers, is that emulating Windows is desirable for
public documented behaviour (obviously), for undocumented behaviour that
applications rely on (also obviously), for undocumented or
semi-documented behaviour where there's no difference otherwise and
where the native thing to do is obvious (e.g. the name of an internal
registry key).

But there's not really a reason to emulate Windows otherwise. And in a
case like this, where there's a significant benefit to not emulating
Windows exactly, the only reason I see is "an application we don't know
yet *might* depend on it". When faced with such a risk, I weigh the
probability of that happening—and on the evidence of DirectShow
applications, I see that as low—with the cost of having to change
design—which also seems low to me; I can say from experience (c.f.
5de712b5d) that swapping out a specific demuxer for decodebin isn't very
difficult.

Not to mention that what we're doing is barely "incorrect". Media
Foundation is an API that's specifically meant to be extended in this
way. For that matter, some application could easily register its own
codec libraries on Windows with a higher priority than the native ones
(this happened with DirectShow); that's essentially no different than
what I'm suggesting.

I think the linked commit misses the point somewhat. That's partially
because I don't think it makes sense to measure simplicity as an
absolute metric simply using line count, and partially because it's
missing the cost of adding other media types to the conversion functions
(which is one of the reasons, though not the only reason, I thought to
write this mail). But it's mostly because the cost of using decodebin,
where it works, is essentially zero: we write one media source, and it
works for everything; no extension for ASF required. If it never becomes
necessary to write a source that outputs compressed samples, then we
also don't have the cost of abstraction (which is always worth taking
seriously!), and if it does, we come out even—we can still use your
generic media source, or something like it.

Ultimately, I think that a decodebin wrapper is something we want to
have anyway, for the sake of host codecs like Theora, and once we have
it, I see zero cost in using it wherever else we can.