[PATCH vkd3d 2/5] vkd3d-shader/hlsl: Perform a copy propagation pass.

Wed Nov 10 03:57:30 CST 2021

Hi,

On 09/11/21 23:21, Zebediah Figura wrote:
> Overall this patch looks a lot better than I was afraid of. It's a lot 
> of code that's intimidating to review, but once you ignore the rbtree 
> boilerplate it's simple enough and seems about in line with what I 
> expect. There's quite a few things that I think can be simplified, but 
> the basic structure seems sound, so nice work.

Great, that's good! Thanks for your review, I think most if not all of 
the points you wrote should be easy to fix.

> The CF part is probably going to be a lot trickier, but fortunately I 
> think that this patch alone is the one that really matters. Frankly, if 
> we were to take this patch, and then a second patch that does copy-prop 
> on interior CF blocks with a fresh copy_propagation_state(), we might 
> even cover enough that it's not even worrying about CF...

Mmh, note that you have to handle control flow in some way or another, 
you cannot just ignore it. This patch doesn't know about control flow, 
but the only way it can do it is by bailing out at the first control 
flow instruction in the program. It's not enough to ignore the control 
flow block and resume processing after it, because the inner block might 
invalidate some of your knowledge about the program state (in 3/5 patch 
this is done by copy_propagation_invalidate_from_block).

>> +/* struct copy_propagation_state represents the accumulated knowledge
>> + * of the copy propagation pass while it scans through the code. Field
>> + * "variables" is a tree whose elements have type struct
>> + * copy_propagation_varible, and represent each of the variables the
>> + * pass has already encountered (except those with special semantics,
>> + * which are ignored). For each variable, the array "values" (whose
>> + * length is the register size of the variable) represent which node
>> + * and which index inside that node (i.e., which of the at most four
>> + * entries of a vector) provided that value last time. Field "node"
>> + * can be NULL, meaning that the pass was not able to statically
>> + * determine the node.
>> + */
> 
> This comment feels a bit too low-level? I dunno, comments like this are 
> hard to review, but my inclination is to describe what you're doing at a 
> high level, along the lines of "we track the last known value of each 
> component of each variable", and let the code itself explain the details.

Personally I would have loved to meet comments like mine in other areas 
of Wine: knowing what data read or written by an algorithm are meant to 
represent makes it much easier to understand why the algorithm does what 
it does. I agree that this is not the most complicated case around, but 
I thought this would be helpful for someone not already knowing my code. 
OTOH I understand that also a high-level description is useful: would 
you consider it satisfying to have both?

>> +static int copy_propagation_variable_compare(const void *key, const 
>> struct rb_entry *entry)
>> +{
>> +    struct copy_propagation_variable *variable = 
>> RB_ENTRY_VALUE(entry, struct copy_propagation_variable, entry);
>> +    uintptr_t key_int = (uintptr_t)key, entry_int = 
>> (uintptr_t)variable->var;
>> +
>> +    if (key_int < entry_int)
>> +        return -1;
>> +    else if (key_int > entry_int)
>> +        return 1;
>> +    else
>> +        return 0;
> 
> Could we just modify the rbtree implementation to use uintptr_t instead, 
> and then do a direct subtraction?

I am not sure of what you mean: the pointer in struct 
copy_propagation_variable is used as a pointer, it would not be very 
logical to store it as an integer.

Also, do you get the right think if you subtract two unsigned integers? 
For one thing the result is unsigned, but even if you cast it to signed 
you don't get an order, do you?

Even if you could, you would get a shorter code, but I am not sure it 
would be a more legible one.

>> +static struct copy_propagation_variable 
>> *copy_propagation_get_variable(struct hlsl_ctx *ctx,
>> +        struct copy_propagation_state *state, struct hlsl_ir_var *var)
>> +{
>> +    struct rb_entry *entry = rb_get(&state->variables, var);
>> +    struct copy_propagation_variable *variable;
>> +    int res;
>> +
>> +    if (entry)
>> +        return RB_ENTRY_VALUE(entry, struct 
>> copy_propagation_variable, entry);
>> +
>> +    variable = hlsl_alloc(ctx, sizeof(*variable));
>> +    if (!variable)
>> +        return NULL;
>> +
>> +    variable->var = var;
>> +    variable->values = hlsl_alloc(ctx, sizeof(*variable->values) * 
>> var->data_type->reg_size);
>> +    if (!variable->values)
>> +    {
>> +        vkd3d_free(variable);
>> +        return NULL;
>> +    }
>> +
>> +    res = rb_put(&state->variables, var, &variable->entry);
>> +    assert(!res);
> 
> Although this is a bit awkward because most of the function is only 
> relevant for stores, not loads, and I think it would actually be better 
> to reflect that in the code, one way or another (so we can do less effort).

Do less effort at what? It's true that when loading I could return NULL 
when the variable has never been seen before (meaning that we don't know 
anything about that variable, so the load cannot be removed) instead of 
creating an "empty" variable placeholder and return that one, but I 
don't really see an advantage doing that. It would seem more effort 
rather than less to me, because there are more possible branches to check.

(BTW, that would also mean that the variable is read before being 
written, which is something that shouldn't normally happen, but I don't 
think it's sensible to count on that)

>> +            if (TRACE_ON())
>> +            {
>> +                char buf[32];
>> +                if (!node)
>> +                    sprintf(buf, "(nil)");
> 
> Can that happen?

Not yet, but it will as soon as loops and conditionals enter the scene, 
because they can cause the last store to a variable to become unknown 
(at compilation time).

Now, I know that usually we don't like to have dead code around. I think 
this case could be accepted, since this code is just a line long and 
becomes alive at 3/5. Also, I find it easier to review this piece of 
code at once, instead of first reviewing a part, checking that the 
missing case will never happen and then review the missing case and 
checking that it fits well in the already-reviewed part.

But if you really don't like it, I can defer the NULL case to 3/5.

>> +                else if (node->index)
>> +                    sprintf(buf, "@%u", node->index);
>> +                else
>> +                    sprintf(buf, "%p", node);
> 
> Can that happen?

Definitely. Arguably, it might be the common case, given that the 
earlier optimization passes (including this pass itself, given that it 
might be executed more than once) might synthesize a lot of nodes, which 
won't have an index until compute_liveness is executed.

>> +                TRACE("variable %s[%d] is written by %p[%d]\n", 
>> variable->var->name, offset + index, buf, index);
> 
> This trace doesn't really match the usual format, but more saliently, 
> the hardcoded buffer is ugly. Assuming the other two cases really can 
> happen, I'd rather see individual traces spelled out.
> 
> Same thing below.

Ok. As usual, I found find that uglier, but I can live with that.

BTW, having the traces is not a condicio sine qua non for me. They were 
useful for me to develop the patch, and I left them because I figured 
that they might also be useful in the future, but we can happily get rid 
of them.

>> +    struct hlsl_deref *src = &load->src;
>> +    struct hlsl_ir_var *var = src->var;
>> +    struct hlsl_ir_swizzle *swizzle;
>> +    DWORD s;
>> +
>> +    if (var->is_input_semantic || var->is_output_semantic || 
>> var->is_uniform)
>> +        return false;
> 
> For input semantics and uniforms: yeah, but do we need to? The important 
> question is "can we reconstruct a store for this variable", and this 
> should already be false.
> 
> For output semantics, we should never get here in the first place.

I think you're right. I had put this check initially because I was 
concentrated on other aspects of the algorithm, but it's true that it is 
overly cautious.

>> +
>> +    if (type->type != HLSL_CLASS_SCALAR && type->type != 
>> HLSL_CLASS_VECTOR)
>> +        return false;
>> +
>> +    offset = hlsl_offset_from_deref(src);
> 
> The problem with hlsl_offset_from_deref(), and the reason I probably 
> should have fought against 62b25bc52b in its current form, is that it 
> not only doesn't deal with non-constant offsets, but doesn't really give 
> the caller a way to bail either. We should probably be returning bool 
> from it, and then aborting here.

I agree. Actually, the first version of my patch set had a variant of 
hlsl_offset_from_deref that acknowledged the possibility that it could 
not be possible to statically determine the offset of a deref. This 
condition is difficult to handle at code generation time, but it is not 
particularly problematic here: at load time just ignore the load, and at 
store time invalidate the whole variable.

I can totally fix hlsl_offset_from_deref so that this happens.

(speaking of constant expressions in the code, I am a bit bothered that 
evaluate_array_dimension is not able to figure out the value of constant 
expressions that are not immediate constants, and unfortunately fixing 
that is not as easy as running a fold_constants pass; I don't in mind a 
clean solution for that that doesn't require re-implementing constant 
folding there; that's another matter, of course)

>> +
>> +    variable = copy_propagation_get_variable(ctx, state, var);
>> +    if (!variable)
>> +        return false;
>> +
>> +    new_node = copy_propagation_reconstruct_node(variable, offset, 
>> type->dimx, indices);
>> +
>> +    if (TRACE_ON())
>> +    {
>> +        char buf[32];
>> +        if (!new_node)
>> +            sprintf(buf, "(nil)");
> 
> Is this useful to trace?

As before, it is a matter of tastes, I believe. I can drop it, just as I 
can drop the traces above.

>> +        else if (new_node->index)
>> +            sprintf(buf, "@%u", new_node->index);
>> +        else
>> +            sprintf(buf, "%p", new_node);
> 
> Can this happen?

Yes, just as above.

>> +    variable = copy_propagation_get_variable(ctx, state, var);
>> +    if (!variable)
>> +        return false;
>> +
>> +    copy_propagation_set_value(variable, hlsl_offset_from_deref(lhs), 
>> store->writemask, store->rhs.node);
>> +
>> +    return false;
>> +}
> 
> Shouldn't this function just return void?

It's a matter of philosophy. If you ask me, the switch in 
copy_propagation_recursive should be rather oblivious of what the 
various helpers actually do, it shouldn't have an insight that 
copy_propagation_store never returns true. For its point of view, any 
helper can potentially modify the code. Though I can change to void if 
you prefer.

>> +static bool copy_propagation_recursive(struct hlsl_ctx *ctx, struct 
>> hlsl_block *block,
>> +        struct copy_propagation_state *state)

So, no comment for a function called "recursive" which is not (yet) 
actually recursive? :-P

>> @@ -1354,7 +1601,12 @@ int hlsl_emit_dxbc(struct hlsl_ctx *ctx, struct 
>> hlsl_ir_function_decl *entry_fun
>>           progress |= transform_ir(ctx, split_struct_copies, body, NULL);
>>       }
>>       while (progress);
>> -    while (transform_ir(ctx, fold_constants, body, NULL));
>> +    do
>> +    {
>> +        progress = transform_ir(ctx, fold_constants, body, NULL);
>> +        progress |= copy_propagation_pass(ctx, body);
> 
> This probably isn't worth examining, but I'm curious why constant 
> folding is useful after copy-prop.

Copy propagation might enable some additional steps of constant folding, 
if a constant is first stored, than loaded and then an operation is 
carried on it. OTOH, constant folding can enable some additional step of 
copy propagation if it makes an offset statically visible.

Thanks, Giovanni.