[PATCH 4/5] wined3d: Introduce shader_run_compute operation for the spirv backend.

Wed Aug 11 07:25:30 CDT 2021

On Wed, 11 Aug 2021 at 11:50, Jan Sikorski <jsikorski at codeweavers.com> wrote:
> > On 10 Aug 2021, at 20:50, Henri Verbeet <hverbeet at gmail.com> wrote:
> > On Tue, 10 Aug 2021 at 10:35, Jan Sikorski <jsikorski at codeweavers.com> wrote:
> >> ---
> >> dlls/wined3d/shader_spirv.c    | 112 +++++++++++++++++++++++++++++++++
> >> dlls/wined3d/wined3d_private.h |   2 +
> >> 2 files changed, 114 insertions(+)
> >>
> > And this too is never used anywhere until patch 5/5 in this series.
>
> Yes, do I squash it all together then?
>
2, 4, and 5 would probably need to be together, yes. In the case of
wined3d_view_load_location(), you can still do that as a separate
patch if you use it in wined3d_rendertarget_view_load_location() in
the same patch.

> >> +    .shader_run_compute = wined3d_spirv_run_compute,
> >> };
> >>
> > "shader_run_compute" remains uninitialised for the other shader backends.
>
> It should be initialised to NULL, which seemed appropriate, but if we don’t want this I’ll make it spit an ERR.
>
So far we've gone with printing a FIXME/ERR for unimplemented
wined3d_shader_backend_ops operations, yes. Perhaps we could revisit
that, but even then we'd probably want to put an explicit NULL in e.g.
"glsl_shader_backend".

> > Conceptually, it doesn't seem quite proper for the shader backends to
> > do compute dispatch by themselves; ideally these would only translate
> > shaders, although in practice they're also responsible for setting up
> > some related state, for OpenGL in particular.
> >
> > I'm not sure whether you perhaps already considered and rejected this,
> > but would it be very hard to use the existing .shader_select_compute()
> > operation from wined3d_unordered_access_view_vk_clear()? Or perhaps
> > simply adapter_vk_dispatch_compute()?
>
> My understanding is that I can’t touch wined3d_state, so in order to unite with existing functionality I’d have to decouple it from the state. Maybe we want this anyway? It looks straightforward to do for shader_select_compute. adapter_vk_dispatch_compute() mostly applies state so I don’t see how it could be of use here.
>
I think we have a number of options, each with their advantages and
disadvantages:

  - We can do this largely adapter agnostic in
wined3d_device_context_clear_uav_float()/wined3d_device_context_clear_uav_uint()
by using wined3d_state_create(), wined3d_device_context_set_state(),
wined3d_device_context_set_shader(),
wined3d_device_context_set_constant_buffers(),
wined3d_device_context_set_unordered_access_views(), and
wined3d_device_context_dispatch(). The nice thing about that approach
is that it would work just as well for the OpenGL backend, even
without ARB_clear_texture and ARB_clear_buffer, or e.g. for a
hypothetical Metal backend. The main disadvantages would be that
wined3d_device_context_set_state() is a fairly heavy operation, and we
wouldn't be able to use the same approach for e.g. depth/stencil
readback.

  - Instead of using wined3d_state_create() and
wined3d_device_context_set_state(), we could modify the existing
device state before dispatch, and restore it afterwards. For graphics
shaders that kind of thing can be a bit painful, but we wouldn't have
to touch that much state for compute shaders. That still doesn't help
for depth/stencil readback though.

  - Instead of doing this on the application side of the command
stream, we could do it from
wined3d_cs_exec_clear_unordered_access_view() or
wined3d_unordered_access_view_vk_clear(). We'd now have to manually
set and invalidate the relevant state, but perhaps that's still ok.
The basic principle should generalise to other compute shaders, and
we're still not using anything adapter-specific, so we could e.g. call
this from wined3d_unordered_access_view_gl_clear() as a fallback. (Or
simply the main implementation; glClearTexSubImage() is convenient,
but I'm not sure whether it's any more efficient than using compute
shaders on any GL implementation.)

  - There's a chance that implementing this in a Vulkan-specific way
would be more efficient. E.g. there may be an advantage to using push
constants instead of uniform buffers, and we'd avoid some state
processing. It would be hard to quantify that without doing some
benchmarking first though. If we take this approach, we essentially
want a small wrapper around vkd3d_shader_compile() from the shader
backend; we give it a D3D shader, some interface/binding information,
and get back a SPIR-V shader. Decoupling shader_select_compute() from
the state largely gets us there, although it introduces some asymmetry
between shader_select() and shader_select_compute() (and decoupling
shader_select() from the state won't be quite as easy). Arguably
there's not much of a point in passing the shader backend a
wined3d_shader in this case either; we may as well just pass it the
byte code. This approach is likely the most flexible and the most
efficient, but also likely the most complicated. By its nature, we
can't use the same implementation for different adapter types.