[PATCH vkd3d 2/5] vkd3d-shader/hlsl: Perform a copy propagation pass.

Thu Nov 11 07:43:36 CST 2021

On Thu, Nov 11, 2021 at 12:50 PM Giovanni Mascellani
<gmascellani at codeweavers.com> wrote:
>
> Hi,
>
> On 11/11/21 12:14, Matteo Bruni wrote:
> >> Notice that variables can have more than four components. Matrices can
> >> have up to 16 and arrays even more.
> > Right, but we probably don't want or need to do copy propagation on
> > those i.e. copy propagation should probably happen after matrix /
> > struct / array splitting.
>
> Mmh, then there is something about splitting that I'm not understanding.
>
> My understanding so far was that variables themselves are not splitted:
> they are just there, and do not appear in the code as themselves.  What
> gets splitted are the temporaries that appear when some piece of code
> actually does something with (say) a matrix. So, for example, if you
> have this fragment of code:
>
> ---
> float4x4 a;
> float4x4 b;
> float4x4 c;
> c = a + b;
> ---
>
> the compile first naively represents it as:
>
> ---
> float4x4 a
> float4x4 b
> float4x4 c
> @1 = load(a) of type float4x4
> @2 = load(b) of type float4x4
> @3 = + (@1 @2) of type float4x4
> store(c, @3)
> ---
>
> and then this gets splitted as:
>
> ---
> float4x4 a
> float4x4 b
> float4x4 c
> @1 = load(a, 0) of type float4
> @2 = load(b, 0) of type float4
> @3 = + (@1 @2) of type float4
> store(c, 0, @3)
> @5 = load(a, 4) of type float4
> @6 = load(b, 4) of type float4
> @7 = + (@5 @6) of type float4
> store(c, 4, @7)
> ...
> ---
>
> That is, the variables keep their type, even though the accesses (loads
> and stores) to the variables have a smaller type. That's my
> understanding of what we want. My code mirrors this, therefore allows a
> variable to have more than four registers.
>
> What is the advantage of splitting variables themselves?

Continuing from your example: assuming a, b and c are temporaries, you
split them into 4 vectors each and update the LOAD and STORE
instructions to point to the specific vector. Once that is done, it
becomes explicit that those groups of 4 instructions (LOAD x2, ADD,
STORE) are in fact entirely independent from each other. That alone
might help further transformations down the road.
It's also pretty nice for register allocation, as it's easier to
allocate 4 groups of 4 registers rather than a single contiguous group
of 16. Sometimes you can even find out that whole rows / columns are
unused and drop them altogether.

The same applies to all the complex types of course, not just
matrices. There is a complication with the above in that sometimes it
can be impossible to split the vars. That is, when the load / store
offset is not always known at compile time. That's a bit unfortunate
but it should also be pretty rare in HLSL shaders. I think it's
worthwhile to optimize for the common case and accept that we won't
necessarily have the best code when things align badly for us.

With all that said: WRT copy propagation and this patch specifically,
I think it's a good idea to only handle vector variables if it makes
things easier (as it should). Notice that you don't have to bail out
entirely even in the "bad" case, as a non-vector is perfectly fine as
a "value". It's only when the complex variable is the destination of a
store that we're in trouble.

> In the specific case of my copy propagation pass, this would make things
> more complicated. For example, if right now I cannot reconstruct the
> offset of a store, I can just invalidate the whole variable. In your
> model, as I get it, I'd have to also invalidate other variables, that
> are unrelated by that point.

I don't think that's the case? A STORE is always directed to a
specific temporary variable and will affect that one alone. I guess
you were thinking of a model where you always split variables into
vectors no matter what, in which case you're right, it quickly becomes
a mess...