[PATCH 4/5] d3dcompiler: Store derefs as an offset to a variable.

Tue May 19 17:11:17 CDT 2020

On 5/19/20 5:07 PM, Matteo Bruni wrote:
> On Tue, May 19, 2020 at 11:14 PM Zebediah Figura
> <zfigura at codeweavers.com> wrote:
>>
>> On 5/19/20 3:44 PM, Matteo Bruni wrote:
>>> On Tue, May 19, 2020 at 10:11 PM Zebediah Figura
>>> <zfigura at codeweavers.com> wrote:
>>>>
>>>> On 5/19/20 2:34 PM, Matteo Bruni wrote:
>>>>> On Mon, May 4, 2020 at 10:04 PM Zebediah Figura <z.figura12 at gmail.com> wrote:
>>>>>>
>>>>>> Signed-off-by: Zebediah Figura <zfigura at codeweavers.com>
>>>>>> ---
>>>>>> One potential change to this patch is to always store an offset. Currently NULL
>>>>>> is used to signify an offset of 0, partly to save space, and partly because
>>>>>> otherwise new_var_deref() would need a bit of restructuring. But always using an
>>>>>> offset would simplify some code.
>>>>>
>>>>> Yeah, that might be a nice followup. Or, it feels like one, without
>>>>> having actually touched the code at all...
>>>>>
>>>>>>
>>>>>>   dlls/d3dcompiler_43/d3dcompiler_private.h |  25 +---
>>>>>>   dlls/d3dcompiler_43/hlsl.y                | 143 +++++++++++++---------
>>>>>>   dlls/d3dcompiler_43/utils.c               |  76 ++----------
>>>>>>   3 files changed, 97 insertions(+), 147 deletions(-)
>>>>>>
>>>>>> diff --git a/dlls/d3dcompiler_43/d3dcompiler_private.h b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>>>> index 387e932a6cc..983057d543c 100644
>>>>>> --- a/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>>>> +++ b/dlls/d3dcompiler_43/d3dcompiler_private.h
>>>>>> @@ -841,30 +841,11 @@ struct hlsl_ir_swizzle
>>>>>>       DWORD swizzle;
>>>>>>   };
>>>>>>
>>>>>> -enum hlsl_ir_deref_type
>>>>>> -{
>>>>>> -    HLSL_IR_DEREF_VAR,
>>>>>> -    HLSL_IR_DEREF_ARRAY,
>>>>>> -    HLSL_IR_DEREF_RECORD,
>>>>>> -};
>>>>>> -
>>>>>>   struct hlsl_deref
>>>>>>   {
>>>>>> -    enum hlsl_ir_deref_type type;
>>>>>> -    union
>>>>>> -    {
>>>>>> -        struct hlsl_ir_var *var;
>>>>>> -        struct
>>>>>> -        {
>>>>>> -            struct hlsl_ir_node *array;
>>>>>> -            struct hlsl_ir_node *index;
>>>>>> -        } array;
>>>>>> -        struct
>>>>>> -        {
>>>>>> -            struct hlsl_ir_node *record;
>>>>>> -            struct hlsl_struct_field *field;
>>>>>> -        } record;
>>>>>> -    } v;
>>>>>> +    struct hlsl_ir_var *var;
>>>>>> +    struct hlsl_ir_node *offset;
>>>>>> +    DWORD writemask;
>>>>>
>>>>> I don't like moving the writemask into struct hlsl_deref. It's
>>>>> something that makes sense only for an lvalue / assignment / STORE.
>>>>> There are a few options here. Maybe turn struct hlsl_deref into struct
>>>>> hlsl_lvalue and use it only in struct hlsl_ir_assignment (i.e. folding
>>>>> the other use into struct hlsl_ir_deref while getting rid of the
>>>>> writemask)? Do we lose a lot by "splitting" struct hlsl_deref?
>>>>
>>>> "writemask" may be a bit of a misnomer, since there's not necessarily
>>>> writing (maybe "mask" is better?), but it's relevant for loads too. It's
>>>> essentially a companion to 'offset', to describe which part of the vec4
>>>> should be read from or written to. For loads we convert the mask into a
>>>> swizzle.
>>>
>>> It shouldn't be necessary though, if the offset is in the unit of
>>> (scalar) components rather than 4-component registers. Per-component
>>> offset, together with the data type should be enough.
>>> Now, of course I haven't really thought it through so there might be dragons...
>>
>> Yes, you're right. I've been dealing with the offset in registers, but
>> there's not really a good reason to do that...
>>
>>>
>>>>>> +
>>>>>> +    if (!(c = d3dcompiler_alloc(sizeof(*c))))
>>>>>> +        return NULL;
>>>>>> +    init_node(&c->node, HLSL_IR_CONSTANT, type, loc);
>>>>>> +    c->v.value.u[0] = n;
>>>>>> +    return c;
>>>>>> +}
>>>>>
>>>>> WRT memory usage, in particular when always storing the offset: we
>>>>> could allocate only the required portion of the whole struct
>>>>> hlsl_ir_constant i.e. something along the lines of
>>>>> "d3dcompiler_alloc(FIELD_OFFSET(struct hlsl_ir_constant,
>>>>> v.value.u[1]))". If the "tricky" allocation happens in a single place,
>>>>> like a new_constant() helper or some-such, maybe it won't be too
>>>>> ugly...
>>>>> BTW I'm not sure that the rest of the compiler is ready for this kind
>>>>> of thing. Let's call it another "nice to have at some point".
>>>>
>>>> This is potentially a good idea, but it also conflicts with one of two
>>>> possible ideas I had for a different problem:
>>>>
>>>> I think we need some way of replacing instructions after parse time,
>>>> including with instructions of a different type. It's useful for
>>>> splitting, constant folding, lowering of one operation to another, and
>>>> so on.
>>>>
>>>> One way to do this is to track def-use chains. This also lets us
>>>> simplify liveness analysis a bit, but it's not an awful lot of code that
>>>> we save that way.
>>>>
>>>> The other thing that we could do is to make hlsl_ir_node a union. I
>>>> think this makes some code simpler, of course at the expense of using a
>>>> bit more memory.
>>>>
>>>> Ultimately I don't have a strong preference, but it is something that'd
>>>> be nice to work out soon, as it does strongly affect the structure of
>>>> almost all code I subsequently write...
>>>
>>> So your idea is to replace an instruction with a different one "in
>>> place" so that all the pointers that refer to it don't need to be
>>> updated. My gut feeling is that it might not be enough in the general
>>> case. In particular, if you split an instruction into multiple
>>> instructions (e.g. maybe some matrix operation into per-vector
>>> arithmetic) you might want to point a users of the original
>>> instruction to any of the new ones.
>>> This kind of transformation would break "plain" def-use chain
>>> replacement too, but that seems solvable.
>>
>> I think that in any such case we also have to replace the new
>> instruction, though. Put another way, if an instruction consumes a
>> matrix, then it still needs to consume a matrix even once the
>> instruction producing a matrix has been split to produce vectors. I
>> don't think it ever makes sense to change the source of an instruction
>> to be a different type.
>>
>> To try to express how I'm planning to split things, given an expression
>> like "mat1 = mat2 * 2" we generate
>>
>> 2: load(mat2)
>> 3: 2
>> 4: @2 * @3
>> 5: mat1 = @4
>>
>> We split @4, yielding
>>
>>   2: load(mat2)
>>   3: 2
>>   4: @2 * @3
>>   5: load(mat2[0])
>>   6: @5 * @3
>>   7: <syn-4>[0] = @6
>>   8: load(mat2[1])
>>   9: @8 * @3
>> 10: <syn-4>[1] = @9
>> 11: mat1 = @4
>>
>> We split @11, yielding
>>
>>      ...
>> 10: <syn-4>[1] = @9
>> 11: mat1 = @4
>> 12: load(<syn-4>[0])
>> 13: mat1[0] = @12
>> 14: load(<syn-4>[1])
>> 15: mat1[1] = @14
>>
>> @11 is removed after splitting, and @4 is subsequently dce'd out. The
>> crucially relevant bit is that we keep both instructions around after
>> splitting them, and we don't actually change the source. This was the
>> simplest approach I could come up with, because it means we can
>> essentially split one instruction at a time, without having to recurse.
> 
> Cool. Yeah, that looks simpler than what I was envisioning. I'm not
> sure how turning struct hlsl_ir_node into a union would help, or
> matter, though.
> 

For that kind of splitting, it doesn't. However, it can help:

* splitting constructors, where the type doesn't need to change, and we 
can replace the HLSL_IR_CONSTRUCTOR instruction with HLSL_IR_LOAD;

* constant folding, where we can replace HLSL_IR_EXPR with HLSL_IR_CONSTANT;

* function inlining, where we replace HLSL_IR_CALL with HLSL_IR_LOAD of 
a synthetic return variable.