Wine GPU decoding
michael at fds-team.de
Mon Mar 31 18:48:43 CDT 2014
> I did some introductory interface reading. If I understand it correctly,
> the dxva implementation / driver can control the pool of the input
> surface. Not only that, it actually creates the surface. Is that correct?
> Afaics the output surface is either a dxva-created surface or a render
> target, is that correct?
All surfaces which are used in conjunction with the dxvapi are created
through the CreateSurface command of the
> If you are in system memory, is there an issue with using the d3d
> surface's memory as the vaapi input buffer? Also take note of user
> pointer surfaces / textures in d3d9ex.
The surfaces are only used for storing the output image and they may
have a different size than the buffers used in the vaapi. MPEG2 for
example uses macro blocks which have a size of 16x16 Pixel and the size
of a frame must therefore be dividable by 16. I noticed that VLC creates
the surfaces with the size of the actual video while it initializes the
decoders with a multiple of 16. Moreover I can not specify the address
to which the output data should be copied I can only map the buffer at
an address defined by vaapi and copy it manually.
> I do not know of any windows driver that supports YUV render targets
> (see above). Are dxva-created output surfaces video memory surfaces (or
> textures) or system memory surfaces? If they are sysmem surfaces you
> don't have a problem - the app either has to read back to sysmem or put
> up with an RGB surface / texture.
DXVA supports both: direct rendering (called native mode) and reading it
back to system memory ( see
> But even if you're copying to an RGB surface you have to get the GL
> texture from the IDirect3DSurface9 somehow. There may not even be one,
> if the surface is just the GL backbuffer. This is just a wine-internal
> problem though and should be solvable one way or another.
> The vaapi-glx interface is also missing options for the mipmap level and
> cube map face. I guess you can ignore that until you find an application
> that wants a video decoded to the negative z face, mipmap level 2, of a
> rendertarget-capable d3d cube texture.
> You may also want a way to make wined3d activate the device's WGL
> context. Right now that's not much of an issue if your code is called
> from the thread that created the device. The command stream will make
> this more difficult though.
We implemented some hack to get the opengl texture id of an D3D9 surface
and to make the OpenGL context current by calling acquire_context(). As
mentioned in the first email, the screenshot was created by using the
> If the vaapi buffer has a constant address you can create a user memory
> d3d surface. I wouldn't be surprised if dxva was a motivation for user
> memory surfaces.
> On a related note, we don't want any GLX code in wined3d, and probably
> not in any dxva.dll. The vaapi-glx.h header seems simple enough to use
> through WGL as it just says a context needs to be active. If not, you'll
> have to export a WGL version of vaapi from winex11.drv.
> At some point we should think about equivalent interfaces on OSX and how
> to abstract between that and vaapi, but not today.
We actually thought about a better solution on how to get around the
problems. We could introduce a new surface type which uses the vaapi
buffers as backend. If the users wants to read the memory back to system
memory we can simply use the map function of vaapi and if the user wants
to actually present the surface we could use the vaapi commands to
convert it into a rgb texture with stuff like deinterlacing. This would
allows us to implement native and copy back without doing unnecessary
conversations or memory copies.
Do you think it would be okay if we try to add such a new type of
surface? I think we would need to put the Vaapi commands into the x11
driver and export some functions which can be called d3d.
I also uploaded the patches in their current state so that you guys can
take a look at what is actually needed to implement dxva2, but it is not
yet in a state in which it could get upstream (we use a separate x11
connection, link statically against libva, inefficient algorithms for
copying frames, ...)
You can find it here:
on the dxva2 branch.
To test it with VLC you need:
1. 32 bit version of libav-dev 1.2.1
On Ubuntu you can get this version of libav-dev from my PPA:
(except for Trusty Thar which already provides this version)
2. Install the vaapi driver, for nvidia you need vdpau-va-driver
Make sure that vainfo (apt-get install vainfo) shows the MPEG2 VLD decoder.
3. You also need to apply this nasty hack to get around a problem with
VLC and Direct3D: http://ix.io/bo5
4. Set the wine prefix to Vista as DXVA2 is only available in >= Vista
5. Install the current git version of VLC (the stable version has a bug
in the DXVA2 code which breaks the decoding of P and B Frames). You can
grab it here: http://nightlies.videolan.org/build/win32/last/
( See https://trac.videolan.org/vlc/ticket/10868 for more information
about the bug. It took me quite some time to figure out that this bug is
in VLC, and not in my code... )
6. Start VLC and enable DXVA2 in the Input/Codecs options. Test it :-)
I did not try it on anything else than nvidia yet and there is some
untested code in the patches which is not supported by the vdpau
wrapper, so that it may break on other graphic cards.
For other users that want to try out the patchset and expect a huge
performance boost: I have to disappoint you! During my tests it was
still slower than CPU decoded video data, but I expect a better
performance after all the copy-overhead has been removed, and especially
for other codecs like H264 the performance boost should be easier to
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3744 bytes
Desc: S/MIME Cryptographic Signature
More information about the wine-devel