[PATCH 2/5] d3dcompiler: Introduce a separate structure for lvalue derefs.

Sun Apr 19 16:06:39 CDT 2020

On 4/17/20 5:43 PM, Matteo Bruni wrote:
> On Fri, Apr 17, 2020 at 6:57 PM Zebediah Figura <zfigura at codeweavers.com> wrote:
>>
>> On 4/17/20 2:03 AM, Matteo Bruni wrote:
>>> On Wed, Apr 15, 2020 at 2:41 AM Zebediah Figura <z.figura12 at gmail.com> wrote:
>>>>
>>>> From: Zebediah Figura <zfigura at codeweavers.com>
>>>>
>>>> Signed-off-by: Zebediah Figura <z.figura12 at gmail.com>
>>>> ---
>>>> aaa625217, and the approach it implies, works, but on further reflection it's
>>>> still not very pretty. For example, the following line of HLSL:
>>>>
>>>> var.a.b = 2.0;
>>>>
>>>> produces the following IR:
>>>>
>>>> 2: 2.0
>>>> 3: deref(var)
>>>> 4: @3.b
>>>> 5: @4.c = @2
>>>>
>>>> This essentially works, provided that the codegen layer knows how to unwind a
>>>> deref chain, but it's pretty janky. Node 4 implies we're reading from node 3,
>>>> and node 3 implies we're reading from var, neither of which is actually
>>>> happening. The RA pass can't easily be smart enough to recognize that 4 doesn't
>>>> need to read from 3, which is a problem if we introduce a "maybe uninitialized"
>>>> warning.
>>>>
>>>> Moreover, if we introduce DCE, we can't DCE out node 3 because of node 4, and
>>>> while we could DCE out node 4 in terms of removing it from the list, we can't
>>>> actually destroy the node, since node 5 is using it. Having nodes not in the
>>>> list is also janky, I feel.
>>>>
>>>> With this patch we instead get the following IR:
>>>>
>>>> 2: 2.0
>>>> 3: deref(var)
>>>> 4: @3.b
>>>> 5: @4.c
>>>> 6: deref(var).b.c = @2
>>>>
>>>> We still get redundant reads, but at least this time we can easily DCE them
>>>> out. Needing less sanity checks in RA is also nice.
>>>>
>>>>   dlls/d3dcompiler_43/d3dcompiler_private.h | 21 +++++-
>>>>   dlls/d3dcompiler_43/hlsl.y                | 28 +++++---
>>>>   dlls/d3dcompiler_43/utils.c               | 83 ++++++++++++++++-------
>>>>   3 files changed, 97 insertions(+), 35 deletions(-)
>>>
>>> Hi Zeb,
>>>
>>> after reading it again and actually getting into "review mode", I have
>>> some more thoughts... Sorry for not thinking of this earlier.
>>> See below.
>>>
>>>> diff --git a/dlls/d3dcompiler_43/d3dcompiler_private.h b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> index 1f1d8e26637..8f81271383b 100644
>>>> --- a/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> +++ b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> @@ -873,10 +873,29 @@ struct hlsl_ir_deref
>>>>       struct hlsl_deref src;
>>>>   };
>>>>
>>>> +struct hlsl_lvalue
>>>> +{
>>>> +    enum hlsl_ir_deref_type type;
>>>> +    union
>>>> +    {
>>>> +        struct hlsl_ir_var *var;
>>>> +        struct
>>>> +        {
>>>> +            struct hlsl_lvalue *array;
>>>> +            struct hlsl_ir_node *index;
>>>> +        } array;
>>>> +        struct
>>>> +        {
>>>> +            struct hlsl_lvalue *record;
>>>> +            struct hlsl_struct_field *field;
>>>> +        } record;
>>>> +    } v;
>>>> +};
>>>> +
>>>>   struct hlsl_ir_assignment
>>>>   {
>>>>       struct hlsl_ir_node node;
>>>> -    struct hlsl_deref lhs;
>>>> +    struct hlsl_lvalue lhs;
>>>>       struct hlsl_ir_node *rhs;
>>>>       unsigned char writemask;
>>>>   };
>>>> diff --git a/dlls/d3dcompiler_43/hlsl.y b/dlls/d3dcompiler_43/hlsl.y
>>>> index d3ef18bb6e9..5366181f356 100644
>>>> --- a/dlls/d3dcompiler_43/hlsl.y
>>>> +++ b/dlls/d3dcompiler_43/hlsl.y
>>>> @@ -2642,16 +2642,16 @@ static unsigned int index_instructions(struct list *instrs, unsigned int index)
>>>>   }
>>>>
>>>>   /* Walk the chain of derefs and retrieve the actual variable we care about. */
>>>> -static struct hlsl_ir_var *hlsl_var_from_deref(const struct hlsl_deref *deref)
>>>> +static struct hlsl_ir_var *hlsl_var_from_lvalue(const struct hlsl_lvalue *deref)
>>>>   {
>>>>       switch (deref->type)
>>>>       {
>>>>           case HLSL_IR_DEREF_VAR:
>>>>               return deref->v.var;
>>>>           case HLSL_IR_DEREF_ARRAY:
>>>> -            return hlsl_var_from_deref(&deref_from_node(deref->v.array.array)->src);
>>>> +            return hlsl_var_from_lvalue(deref->v.array.array);
>>>>           case HLSL_IR_DEREF_RECORD:
>>>> -            return hlsl_var_from_deref(&deref_from_node(deref->v.record.record)->src);
>>>> +            return hlsl_var_from_lvalue(deref->v.record.record);
>>>>       }
>>>>       assert(0);
>>>>       return NULL;
>>>> @@ -2674,7 +2674,7 @@ static void compute_liveness_recurse(struct list *instrs, unsigned int loop_firs
>>>>           case HLSL_IR_ASSIGNMENT:
>>>>           {
>>>>               struct hlsl_ir_assignment *assignment = assignment_from_node(instr);
>>>> -            var = hlsl_var_from_deref(&assignment->lhs);
>>>> +            var = hlsl_var_from_lvalue(&assignment->lhs);
>>>>               if (!var->first_write)
>>>>                   var->first_write = loop_first ? min(instr->index, loop_first) : instr->index;
>>>>               assignment->rhs->last_read = instr->index;
>>>> @@ -2693,10 +2693,22 @@ static void compute_liveness_recurse(struct list *instrs, unsigned int loop_firs
>>>>           case HLSL_IR_DEREF:
>>>>           {
>>>>               struct hlsl_ir_deref *deref = deref_from_node(instr);
>>>> -            var = hlsl_var_from_deref(&deref->src);
>>>> -            var->last_read = loop_last ? max(instr->index, loop_last) : instr->index;
>>>> -            if (deref->src.type == HLSL_IR_DEREF_ARRAY)
>>>> -                deref->src.v.array.index->last_read = instr->index;
>>>> +
>>>> +            switch (deref->src.type)
>>>> +            {
>>>> +                case HLSL_IR_DEREF_VAR:
>>>> +                    deref->src.v.var->last_read = loop_last ? max(instr->index, loop_last) : instr->index;
>>>> +                    break;
>>>> +
>>>> +                case HLSL_IR_DEREF_ARRAY:
>>>> +                    deref->src.v.array.array->last_read = instr->index;
>>>> +                    deref->src.v.array.index->last_read = instr->index;
>>>> +                    break;
>>>> +
>>>> +                case HLSL_IR_DEREF_RECORD:
>>>> +                    deref->src.v.record.record->last_read = instr->index;
>>>> +                    break;
>>>> +            }
>>>>               break;
>>>>           }
>>>>           case HLSL_IR_EXPR:
>>>> diff --git a/dlls/d3dcompiler_43/utils.c b/dlls/d3dcompiler_43/utils.c
>>>> index d24341329d3..519f50612e8 100644
>>>> --- a/dlls/d3dcompiler_43/utils.c
>>>> +++ b/dlls/d3dcompiler_43/utils.c
>>>> @@ -1478,27 +1478,41 @@ static unsigned int invert_swizzle(unsigned int *swizzle, unsigned int writemask
>>>>       return new_writemask;
>>>>   }
>>>>
>>>> -static BOOL validate_lhs_deref(const struct hlsl_ir_node *lhs)
>>>> +static BOOL get_assignment_lhs(struct hlsl_lvalue *dst, struct hlsl_ir_node *node)
>>>>   {
>>>>       struct hlsl_ir_deref *deref;
>>>>
>>>> -    if (lhs->type != HLSL_IR_DEREF)
>>>> +    if (node->type != HLSL_IR_DEREF)
>>>>       {
>>>> -        hlsl_report_message(lhs->loc, HLSL_LEVEL_ERROR, "invalid lvalue");
>>>> +        hlsl_report_message(node->loc, HLSL_LEVEL_ERROR, "invalid lvalue");
>>>>           return FALSE;
>>>>       }
>>>>
>>>> -    deref = deref_from_node(lhs);
>>>> +    deref = deref_from_node(node);
>>>> +    dst->type = deref->src.type;
>>>>
>>>> -    if (deref->src.type == HLSL_IR_DEREF_VAR)
>>>> -        return TRUE;
>>>> -    if (deref->src.type == HLSL_IR_DEREF_ARRAY)
>>>> -        return validate_lhs_deref(deref->src.v.array.array);
>>>> -    if (deref->src.type == HLSL_IR_DEREF_RECORD)
>>>> -        return validate_lhs_deref(deref->src.v.record.record);
>>>> +    switch (deref->src.type)
>>>> +    {
>>>> +        case HLSL_IR_DEREF_VAR:
>>>> +            dst->v.var = deref->src.v.var;
>>>> +            return TRUE;
>>>>
>>>> -    assert(0);
>>>> -    return FALSE;
>>>> +        case HLSL_IR_DEREF_ARRAY:
>>>> +            dst->v.array.index = deref->src.v.array.index;
>>>> +            if (!(dst->v.array.array = d3dcompiler_alloc(sizeof(*dst->v.array.array))))
>>>> +                return FALSE;
>>>> +            return get_assignment_lhs(dst->v.array.array, deref->src.v.array.array);
>>>> +
>>>> +        case HLSL_IR_DEREF_RECORD:
>>>> +            dst->v.record.field = deref->src.v.record.field;
>>>> +            if (!(dst->v.record.record = d3dcompiler_alloc(sizeof(*dst->v.record.field))))
>>>> +                return FALSE;
>>>> +            return get_assignment_lhs(dst->v.record.record, deref->src.v.record.record);
>>>> +
>>>> +        default:
>>>> +            assert(0);
>>>> +            return FALSE;
>>>> +    }
>>>>   }
>>>
>>> I'm somewhat annoyed by this, it feels like we could directly create
>>> lvalue nodes while parsing... Except there doesn't seem to be any
>>> simple way. So I'm going to try and keep my initial knee-jerk reaction
>>> at bay.
>>>
>>> On the other hand, I think it would be nicer to enforce a flat IR
>>> structure. Instead of having an arbitrarily complex lvalue inside the
>>> assignment instruction, maybe it should have the same structure of a
>>> deref, where each "level" is a separate instruction. This is probably
>>> why I was thinking of a flag rather than a separate node type: when
>>> constructing the assignment you would traverse the lhs derefs and turn
>>> them into lvalues. Although having a separate node type and changing
>>> that works as well (and it's probably nicer with liveness computation
>>> and other passes).
>>>
>>> Does that sound reasonable? Am I missing anything?
>>
>> Well, note that one problem is we kind of have to (recursively) dup the
>> lvalue anyway, so that complex assignments work.
> 
> I see your point. I guess a possibility opened by the flat derefs is
> that you could share intermediate derefs, although that might well
> make things more complicated for no benefit.

I mean, it could potentially be done. I guess the easiest way that
occurs to me is to set a flag that just states whether a HLSL_IR_DEREF
is read from...

>> I think I understand why flattening is desirable in general, but it's
>> not clear to me how that would apply to lvalues (or even to rvalue deref
>> chains, honestly), since at codegen time we essentially resolve
>> arbitrarily complex deref chains into one register (well, except for
>> non-constant array indexing...)
> 
> True, but the same can be said in principle for any other instruction.

What I meant to say is that an expression like "(a + b) * c" is
necessarily two instructions, whereas "a.b.c.<...>.e = f" is always one,
regardless of the length of the deref chain.

> Also if you always "go flat" you have only one way to follow sources,
> instead of having to mix tree traversing inside a single instruction
> with instruction traversing.
> Relatedly, if we start doing more invasive IR transformations, having
> simple, "atomic" instructions might make things easier. E.g. with
> structure splitting, I think you'd end up with replacing one
> instruction (as in one instruction list entry) without changing the
> rest of the IR, while in the other case you have to replace a piece of
> a larger instruction which might be buried arbitrarily deep in the
> tree.
> Admittedly this is all very abstract and I might not be seeing a
> sensible overall picture.
> 

I guess the way I look at it is, emitting an assignment instruction
means we have to reach up the entire chain at codegen time anyway, i.e.
I'm not sure how you can emit an instruction for each individual lvalue
deref node, whereas you can emit (at least) one instruction for each
node currently.

To be clear, I don't think it's awful to emit nodes for lvalue derefs,
I'm just not convinced it's better—but I'll defer to the maintainer in
any case.

>> It's also not obvious to me that's prettier to flatten it, given on the
>> other hand we're also now inserting nodes into the IR list that would
>> presumably not participate in RA or generate code.
> 
> Right, but that's a bit like doing an optimization right while parsing.
> I guess we want lvalues and rvalues / derefs to either be both flat or
> both tree-like. Let's focus on rvalues for a moment.
> Imagine you have a complicated deref chain, eventually picking a
> single float out of something larger and complex. If you have a very
> naive compiler you can imagine that each step of the deref is
> implemented as a copy of whatever the result of the deref step is into
> a temporary variable. This might mean e.g. temporarily copying a full
> matrix before getting at the individual component. Incredibly
> inefficient, but simple and consistent.
> Obviously this isn't very interesting and can be simplified right
> away... except when it can't, like the non-constant array indexing
> case you mention. Or copying a whole input structure to an output
> variable.
> For lvalues you can imagine the specular thing, where the naive
> compiler stores results into a temporary variable and then copies that
> into the actual destination.
> 
> It seems to me that it would be nicer to have a simple and flat IR
> structure where each entry can do very little, or even nothing, in
> practice, and handle simplifications and IR restructuring over a
> number of passes rather than the opposite. Somewhat related, you can
> already get instructions that eventually do nothing of consequence
> (e.g. dead code, swizzling into the exact same thing, ...) and I don't
> think that has to be necessarily that different from the compiler's
> point of view.

So I'm not entirely sure from this if you have something specific in
mind, but I'll at least run through my thoughts on the matter.

The compiler that exists in at least two of my local trees does emit a
copy for every HLSL_IR_DEREF, i.e. from a variable into a temp node.
This is awful, inasmuchas it matters, but the bottom line is by design,
HLSL_IR_DEREF is essentially completely redundant, as you say.

One way to get around this—and the approach implied if we do make each
deref an explicit node—is to reach through all the derefs at codegen
time and grab the relevant register. We'd skip actually emitting code
for any deref node. Hence an expression like "func().b.c + 1" would
generate IR like

2: func()
3: @2.b
4: @3.c
5: 1
6: @4 + @5

Inlining aside, we'd only allocate registers for 2, 5, 6, and the source
to node 6 would offset the register allocated to node 2 by the offset of
"b.c".

Note that liveness analysis would also have to recursively reach through
deref nodes.

The advantage of this, I guess, is that if we really do want to make
lvalues separate nodes, then rvalues should probably be separate nodes too.

The second alternative that I was looking at was to make this a bit more
explicit, and essentially let the hlsl_src structure contain a union of
hlsl_ir_node and hlsl_deref (note that hlsl_deref.array/record would
then have to be a pointer to hlsl_src instead of the flat structure as
patch 0005 has it). Then we'd essentially generate

2: func()
3: 1
4: @2.b.c + 3

I mildly prefer this, since I like having the IR language be explicit
about what's allowed and to resemble the code we produce, and it's not
difficult to form this even at parse time. We can't easily get rid of
HLSL_IR_DEREF entirely here, but we can guarantee it doesn't occur
except at parse time (plus maybe a dce run?)

A third alternative is to adopt my patches 010* from my 6 April mail,
i.e. to make both lvalue and rvalue derefs explicitly chains of derefs,
and synthesize variables when necessary. That doesn't preclude folding
hlsl_deref into hlsl_src, of course.

>> Maybe I'm misunderstanding what you're trying to propose here, though?
> 
> I don't think you are. I might be missing something more or less
> obvious to you though.
>