[PATCH 4/5] d3dcompiler: Store derefs as an offset to a variable.

Tue May 19 16:14:39 CDT 2020

On 5/19/20 3:44 PM, Matteo Bruni wrote:
> On Tue, May 19, 2020 at 10:11 PM Zebediah Figura
> <zfigura at codeweavers.com> wrote:
>>
>> On 5/19/20 2:34 PM, Matteo Bruni wrote:
>>> On Mon, May 4, 2020 at 10:04 PM Zebediah Figura <z.figura12 at gmail.com> wrote:
>>>>
>>>> Signed-off-by: Zebediah Figura <zfigura at codeweavers.com>
>>>> ---
>>>> One potential change to this patch is to always store an offset. Currently NULL
>>>> is used to signify an offset of 0, partly to save space, and partly because
>>>> otherwise new_var_deref() would need a bit of restructuring. But always using an
>>>> offset would simplify some code.
>>>
>>> Yeah, that might be a nice followup. Or, it feels like one, without
>>> having actually touched the code at all...
>>>
>>>>
>>>>  dlls/d3dcompiler_43/d3dcompiler_private.h |  25 +---
>>>>  dlls/d3dcompiler_43/hlsl.y                | 143 +++++++++++++---------
>>>>  dlls/d3dcompiler_43/utils.c               |  76 ++----------
>>>>  3 files changed, 97 insertions(+), 147 deletions(-)
>>>>
>>>> diff --git a/dlls/d3dcompiler_43/d3dcompiler_private.h b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> index 387e932a6cc..983057d543c 100644
>>>> --- a/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> +++ b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>> @@ -841,30 +841,11 @@ struct hlsl_ir_swizzle
>>>>      DWORD swizzle;
>>>>  };
>>>>
>>>> -enum hlsl_ir_deref_type
>>>> -{
>>>> -    HLSL_IR_DEREF_VAR,
>>>> -    HLSL_IR_DEREF_ARRAY,
>>>> -    HLSL_IR_DEREF_RECORD,
>>>> -};
>>>> -
>>>>  struct hlsl_deref
>>>>  {
>>>> -    enum hlsl_ir_deref_type type;
>>>> -    union
>>>> -    {
>>>> -        struct hlsl_ir_var *var;
>>>> -        struct
>>>> -        {
>>>> -            struct hlsl_ir_node *array;
>>>> -            struct hlsl_ir_node *index;
>>>> -        } array;
>>>> -        struct
>>>> -        {
>>>> -            struct hlsl_ir_node *record;
>>>> -            struct hlsl_struct_field *field;
>>>> -        } record;
>>>> -    } v;
>>>> +    struct hlsl_ir_var *var;
>>>> +    struct hlsl_ir_node *offset;
>>>> +    DWORD writemask;
>>>
>>> I don't like moving the writemask into struct hlsl_deref. It's
>>> something that makes sense only for an lvalue / assignment / STORE.
>>> There are a few options here. Maybe turn struct hlsl_deref into struct
>>> hlsl_lvalue and use it only in struct hlsl_ir_assignment (i.e. folding
>>> the other use into struct hlsl_ir_deref while getting rid of the
>>> writemask)? Do we lose a lot by "splitting" struct hlsl_deref?
>>
>> "writemask" may be a bit of a misnomer, since there's not necessarily
>> writing (maybe "mask" is better?), but it's relevant for loads too. It's
>> essentially a companion to 'offset', to describe which part of the vec4
>> should be read from or written to. For loads we convert the mask into a
>> swizzle.
> 
> It shouldn't be necessary though, if the offset is in the unit of
> (scalar) components rather than 4-component registers. Per-component
> offset, together with the data type should be enough.
> Now, of course I haven't really thought it through so there might be dragons...

Yes, you're right. I've been dealing with the offset in registers, but
there's not really a good reason to do that...

> 
>>>> +
>>>> +    if (!(c = d3dcompiler_alloc(sizeof(*c))))
>>>> +        return NULL;
>>>> +    init_node(&c->node, HLSL_IR_CONSTANT, type, loc);
>>>> +    c->v.value.u[0] = n;
>>>> +    return c;
>>>> +}
>>>
>>> WRT memory usage, in particular when always storing the offset: we
>>> could allocate only the required portion of the whole struct
>>> hlsl_ir_constant i.e. something along the lines of
>>> "d3dcompiler_alloc(FIELD_OFFSET(struct hlsl_ir_constant,
>>> v.value.u[1]))". If the "tricky" allocation happens in a single place,
>>> like a new_constant() helper or some-such, maybe it won't be too
>>> ugly...
>>> BTW I'm not sure that the rest of the compiler is ready for this kind
>>> of thing. Let's call it another "nice to have at some point".
>>
>> This is potentially a good idea, but it also conflicts with one of two
>> possible ideas I had for a different problem:
>>
>> I think we need some way of replacing instructions after parse time,
>> including with instructions of a different type. It's useful for
>> splitting, constant folding, lowering of one operation to another, and
>> so on.
>>
>> One way to do this is to track def-use chains. This also lets us
>> simplify liveness analysis a bit, but it's not an awful lot of code that
>> we save that way.
>>
>> The other thing that we could do is to make hlsl_ir_node a union. I
>> think this makes some code simpler, of course at the expense of using a
>> bit more memory.
>>
>> Ultimately I don't have a strong preference, but it is something that'd
>> be nice to work out soon, as it does strongly affect the structure of
>> almost all code I subsequently write...
> 
> So your idea is to replace an instruction with a different one "in
> place" so that all the pointers that refer to it don't need to be
> updated. My gut feeling is that it might not be enough in the general
> case. In particular, if you split an instruction into multiple
> instructions (e.g. maybe some matrix operation into per-vector
> arithmetic) you might want to point a users of the original
> instruction to any of the new ones.
> This kind of transformation would break "plain" def-use chain
> replacement too, but that seems solvable.

I think that in any such case we also have to replace the new
instruction, though. Put another way, if an instruction consumes a
matrix, then it still needs to consume a matrix even once the
instruction producing a matrix has been split to produce vectors. I
don't think it ever makes sense to change the source of an instruction
to be a different type.

To try to express how I'm planning to split things, given an expression
like "mat1 = mat2 * 2" we generate

2: load(mat2)
3: 2
4: @2 * @3
5: mat1 = @4

We split @4, yielding

 2: load(mat2)
 3: 2
 4: @2 * @3
 5: load(mat2[0])
 6: @5 * @3
 7: <syn-4>[0] = @6
 8: load(mat2[1])
 9: @8 * @3
10: <syn-4>[1] = @9
11: mat1 = @4

We split @11, yielding

    ...
10: <syn-4>[1] = @9
11: mat1 = @4
12: load(<syn-4>[0])
13: mat1[0] = @12
14: load(<syn-4>[1])
15: mat1[1] = @14

@11 is removed after splitting, and @4 is subsequently dce'd out. The
crucially relevant bit is that we keep both instructions around after
splitting them, and we don't actually change the source. This was the
simplest approach I could come up with, because it means we can
essentially split one instruction at a time, without having to recurse.

> 
>>>> +
>>>>  static struct hlsl_ir_deref *new_var_deref(struct hlsl_ir_var *var, const struct source_location loc)
>>>>  {
>>>>      struct hlsl_ir_deref *deref = d3dcompiler_alloc(sizeof(*deref));
>>>> @@ -557,60 +571,76 @@ static struct hlsl_ir_deref *new_var_deref(struct hlsl_ir_var *var, const struct
>>>>          return NULL;
>>>>      }
>>>>      init_node(&deref->node, HLSL_IR_DEREF, var->data_type, loc);
>>>> -    deref->src.type = HLSL_IR_DEREF_VAR;
>>>> -    deref->src.v.var = var;
>>>> +    deref->src.var = var;
>>>> +    if (type_is_single_reg(var->data_type))
>>>> +        deref->src.writemask = (1 << var->data_type->dimx) - 1;
>>>>      return deref;
>>>>  }
>>>>
>>>> -static struct hlsl_ir_node *get_var_deref(struct hlsl_ir_node *node)
>>>> +static struct hlsl_ir_deref *new_deref(struct hlsl_ir_node *var_node, struct hlsl_ir_node *offset,
>>>> +        struct hlsl_type *data_type, const struct source_location loc)
>>>>  {
>>>> -    struct hlsl_ir_assignment *assign;
>>>> +    struct hlsl_ir_node *add = NULL;
>>>>      struct hlsl_ir_deref *deref;
>>>>      struct hlsl_ir_var *var;
>>>> -    char name[27];
>>>>
>>>> -    if (node->type == HLSL_IR_DEREF)
>>>> -        return node;
>>>> +    if (var_node->type == HLSL_IR_DEREF)
>>>> +    {
>>>> +        const struct hlsl_deref *src = &deref_from_node(var_node)->src;
>>>>
>>>> -    sprintf(name, "<deref-%p>", node);
>>>> -    if (!(var = new_synthetic_var(name, node->data_type, node->loc)))
>>>> -        return NULL;
>>>> +        var = src->var;
>>>> +        if (src->offset)
>>>> +        {
>>>> +            if (!(add = new_binary_expr(HLSL_IR_BINOP_ADD, src->offset, offset, loc)))
>>>> +                return NULL;
>>>> +            list_add_after(&offset->entry, &add->entry);
>>>> +            offset = add;
>>>> +        }
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        struct hlsl_ir_assignment *assign;
>>>> +        char name[27];
>>>> +
>>>> +        sprintf(name, "<deref-%p>", var_node);
>>>> +        if (!(var = new_synthetic_var(name, var_node->data_type, var_node->loc)))
>>>> +            return NULL;
>>>>
>>>> -    TRACE("Synthesized variable %p for %s node.\n", var, debug_node_type(node->type));
>>>> +        TRACE("Synthesized variable %p for %s node.\n", var, debug_node_type(var_node->type));
>>>>
>>>> -    if (!(assign = make_simple_assignment(var, node)))
>>>> -        return NULL;
>>>> -    list_add_after(&node->entry, &assign->node.entry);
>>>> +        if (!(assign = make_simple_assignment(var, var_node)))
>>>> +            return NULL;
>>>>
>>>> -    if (!(deref = new_var_deref(var, var->loc)))
>>>> +        list_add_after(&var_node->entry, &assign->node.entry);
>>>> +    }
>>>> +
>>>> +    if (!(deref = d3dcompiler_alloc(sizeof(*deref))))
>>>>          return NULL;
>>>> -    list_add_after(&assign->node.entry, &deref->node.entry);
>>>> -    return &deref->node;
>>>> +    init_node(&deref->node, HLSL_IR_DEREF, data_type, loc);
>>>> +    deref->src.var = var;
>>>> +    deref->src.offset = offset;
>>>> +    if (type_is_single_reg(data_type))
>>>> +        deref->src.writemask = (1 << data_type->dimx) - 1;
>>>> +    list_add_after(&offset->entry, &deref->node.entry);
>>>> +    return deref;
>>>>  }
>>>
>>> How bad is it to avoid creating "extra" deref nodes? Mostly curious,
>>> it's okay to leave dead code around.
>>>
>>
>> So I'm guessing that what you mean by this is: given an expression like
>> "a.b.c", instead of
>>
>> 2: a
>> 3: 2 # offset of b
>> 4: a[@3]
>> 5: 2 # offset of c
>> 6: @3 + @5
>> 7: a[@6]
>>
>> we delete or replace @2 and @4 when generating @4 and @7, respectively.
>>
>> It's a bit tricky, because while in practice I think that instructions
>> @2 and @4 will have no uses, that's not really spelled out as a
>> guarantee anywhere, and having it be a thing to implicitly remember
>> makes me a *little* nervous. (I'm not sure I trust myself to implicitly
>> remember it.) If we track def-use chains, that's less of a problem, but
>> I'm still not sure whether we want to track def-use chains—see above.
>>
>> (Arguably we need def-use anyway, for DCE, but we don't really—we only
>> need to check whether a field is used, not where, so we can just reuse
>> the liveness analysis passes we already have.)
> 
> Yes, that was my question, and yeah, I see. Definitely no point in
> complicating this for no reason aside from making the IR nicer to
> read.
> 
> Also agreed, the current liveness stuff should be enough for DCE,
> especially for the time being.
>