[PATCH 1/2] winegstreamer: Add helper for GstCaps <-> IMFMediaType conversion.

Derek Lesho dlesho at codeweavers.com
Fri Mar 27 10:05:40 CDT 2020

On 3/26/20 4:56 PM, Zebediah Figura wrote:
> There's another broad question I have with this approach, actually,
> which is fundamental enough I have to assume it's at had some thought
> put into it, but it would be nice if that discussion happened in a more
> public place, and was justified in the patches sent.
> Essentially, the question is: what if we were to use decodebin directly?
> As I understand (and admittedly Media Foundation is far more complex
> than I could hope to understand) an application which just calls
> IMFSourceResolver methods just needs to get back a working
> IMFMediaSource, and we could wrap decodebin with one of those, similar
> to the quartz wrapper.
The most basic applications (games) seem to either use a source reader 
or simple sample grabber media session to get their raw samples.  If you 
want to add a hack for using decodebin, you can easily add a special 
source type, and for the media source of that type, just make a 
decodebin element instead of searching for a demuxer.  In this case, the 
source reader wouldn't search for a decoder since the output type set by 
the application would be natively supported by the source.  Then, as 
part of the hack, just always yield that source type in the source 
resolver.  This is completely incorrect and probably shouldn't make it's 
way into mainline, IMO.  Also, I have reason to believe it may break 
Unity3D, as they do look at the native media types supported by the 
source, and getting around this would require adding some hackery in the 
source reader.
> First of all, this is something I think we want to do anyway. Microsoft
> has no demuxer for, say, Vorbis (at least, there's not one registered on
> my Windows 10 machine), but I think that we want to be able to play back
> Vorbis files anyway (in, say, a Win32 media player application).
I'm pretty sure our goal is not to extend windows functionality.
>   Instead
> of writing yet another source for vorbis,
You don't "write another source", you just expose a new source object 
and link it with a new source_desc structure, which specifies the mime 
type of the container format: 
>   and for each other obscure
> format, we just write one generic decodebin wrapper.
Not to mention, you'd have to perform this step with a decodebin wrapper 
> Second of all, the most obvious benefit, at least while looking at these
> patches, is that you now don't need to write caps <-> IMFMediaType
> conversion for every type on the planet.
I don't see this as a problem, most games I've seen will use either 
H.264 of WMV, and adding new formats isn't that difficult.  You look at 
the caps exposed by the gstreamer demuxer, find the equivalent 
attributes in media foundation, and fill in the gaps.  In return you get 
correct behavior, and a source that can be paired with a correctly 
written MFT from outside of the wine source.
>   Another benefit is that you let
> all of the decoding happen within a single GStreamer pipeline, which is
> probably better for performance.
I have applications working right now with completely acceptable 
performance, and we are still copying every uncompressed sample an extra 
time, which we may be able to optimize away.  Copying compressed 
samples, on the other hand, is not that big of a deal at all.
>   You also can simplify your
> postprocessing step to adding a single videoconvert and audioconvert,
> instead of having to manually (or semi-manually) add e.g. an h264 parser
> element.
It isn't manual, we find a parser which corrects the caps.  And as I 
mentioned in earlier email, we could also use caps negotiation for this, 
all the setup is in place.
>   These are some of the benefits I had in mind when removing the
> GStreamer quartz transforms.
> Even in the case where the application manually creates e.g. an MPEG-4
> source, my understanding is it's still the source's job to automatically
> append transforms to match the requested type.
It's not the source's job at all.  On windows, where sources are 
purpose-built, they apply no transformations to the types they get, 
their goal is only to get raw sample data from a container / stream.  
It's the job of the media session, or source reader to apply transforms 
when needed.
>   We'd just be moving that
> from the mfplat level to the gstreamer level—i.e. let decodebin select
> the 'transforms' needed to convert to raw video and audio.
The media session and source reader shouldn't be affected by 
winegstreamer details.  If a user/an application decides to install a 
third party decoder, we still need the infrastructure in place for this 
to function.
> It obviously wouldn't match native structure, but it's not clear to me
> that it would fail to match native in a way that would cause problems.
> Judging from my experience with quartz, most applications aren't going
> to care how their media is decoded as long as they get raw samples out
> of it.
Most games, or most applications?  Chromium uses media foundation in a 
much more granular way.
>   Only a select few build the graph manually because they don't
> realize that they can autoplug, or make assumptions about which filters
> will be present once autoplugging is done, and some of those even fall
> back to autoplugging if their preferred method fails. Maybe the
> situation is different with mfplat, but given that there is a way to let
> mfplat figure out which sources and transforms to use, I'm gonna be
> really surprised if most applications aren't using it.
> If you do come across an application that requires we mimic native's
> specific arrangement of sources and transforms, it seems to me it
> wouldn't require that much effort to swap a different parser in for
> decodebin, and to implement the necessary bits in the media type
> conversion functions. Ultimately I suspect it'd be less work to have a
> decodebin wrapper + specific sources for applications that require them,
> than to manually implement every source and transform.
The current solution isn't very manual, and, as I mentioned earlier in 
this email, you also can construct a decodebin wrapper source using the 
infrastructure which is available.  And in general terms, I think it's 
more work to maintain a solution that doesn't match up to windows, as we 
now have to think of all these edge cases and how to work around them.
On 3/26/20 8:07 PM, Zebediah Figura wrote:
> While I await your more complete response, I figure I might as well
> clarify some things.
> I don't think that "doing the incorrect thing", i.e. failing to exactly
> emulate Windows, should necessarily be considered bad in itself, or at
> least not nearly as bad as all that.
> My view, and my understanding of the Wine project's view in general as
> informed by its maintainers, is that emulating Windows is desirable for
> public documented behaviour (obviously), for undocumented behaviour that
> applications rely on (also obviously), for undocumented or
> semi-documented behaviour where there's no difference otherwise and
> where the native thing to do is obvious (e.g. the name of an internal
> registry key).
In my view, when completely incorrect behavior is only a few function 
calls away, that's not acceptable.  The media source is a well 
documented public interface, and doing something different instead is 
just asking for trouble.
> But there's not really a reason to emulate Windows otherwise. And in a
> case like this, where there's a significant benefit to not emulating
> Windows exactly, the only reason I see is "an application we don't know
> yet *might* depend on it". When faced with such a risk, I weigh the
> probability of that happening—and on the evidence of DirectShow
> applications, I see that as low—with the cost of having to change
> design—which also seems low to me; I can say from experience (c.f.
> 5de712b5d) that swapping out a specific demuxer for decodebin isn't very
> difficult.
The converse of this is also true, if you want to quickly experiment 
with some gstreamer codec that we don't support yet, you just perform 
the hack I mentioned earlier, and then after you get it working you make 
it correct by adding the necessary gstreamer caps. Another hack we could 
use is to serialize the compressed caps, and throw them in a 
MF_MT_USER_DATA attribute, and hope that an application never looks.  
But as I mentioned earlier, I don't think the amount of work required 
for adding a new media type is excessive.  Microsoft only ships a 
limited amount of sources and decoders, they fit on a single page: 
, so it's not like we'll be adding new types for years to come.
> Not to mention that what we're doing is barely "incorrect". Media
> Foundation is an API that's specifically meant to be extended in this
> way.
I don't think Microsoft ever meant for an application to make a media 
source that decodes compressed content, the source reader and media 
session exist for a reason.
>   For that matter, some application could easily register its own
> codec libraries on Windows with a higher priority than the native ones
> (this happened with DirectShow); that's essentially no different than
> what I'm suggesting.
Yes, but even in that case, I assume they will still follow the basic 
concept of what a source is and is not.
> I think the linked commit misses the point somewhat. That's partially
> because I don't think it makes sense to measure simplicity as an
> absolute metric simply using line count,
It's not just line count, the code itself is very simple, all we are 
doing is registering the supported input and output types of the 
decoder, setting the mime type of the container format for the source, 
and and registering both objects.
>   and partially because it's
> missing the cost of adding other media types to the conversion functions
You can use the MF_MT_USER_DATA serialization hack if you're worried 
about that.
> (which is one of the reasons, though not the only reason, I thought to
> write this mail). But it's mostly because the cost of using decodebin,
> where it works, is essentially zero:
Except in the cases where an application does something unexpected.
>   we write one media source,
>   and it
> works for everything; no extension for ASF required.
There already is only one real implementation of the media source, the 
only "extension" is adding the mime type instead of using typefind.  We 
will register the necessary byte stream handlers no matter which path we 
>   If it never becomes
> necessary to write a source that outputs compressed samples, then we
> also don't have the cost of abstraction (which is always worth taking
> seriously!), and if it does, we come out even—we can still use your
> generic media source, or something like it.
> Ultimately, I think that a decodebin wrapper is something we want to
> have anyway, for the sake of host codecs like Theora,
Where would we use support for Theora, if no windows applications are 
able to use it.
>   and once we have
> it, I see zero cost in using it wherever else we can.

More information about the wine-devel mailing list