[PATCH vkd3d 2/5] vkd3d-shader/hlsl: Perform a copy propagation pass.

Thu Nov 11 15:24:43 CST 2021

On 11/11/21 3:18 AM, Giovanni Mascellani wrote:
> Hi,
> 
> On 10/11/21 17:33, Zebediah Figura (she/her) wrote:
>>> Mmh, note that you have to handle control flow in some way or another,
>>> you cannot just ignore it. This patch doesn't know about control flow,
>>> but the only way it can do it is by bailing out at the first control
>>> flow instruction in the program. It's not enough to ignore the control
>>> flow block and resume processing after it, because the inner block
>>> might invalidate some of your knowledge about the program state (in
>>> 3/5 patch this is done by copy_propagation_invalidate_from_block).
>>
>> Right. I should have said to invalidate the whole copyprop state both
>> when entering a CF block and when leaving one. The point is that we can
>> do copyprop within every BB, not just the first one, without having to
>> worry about things like partial invalidation.
> 
> I guess that by now I've already read the other mail I sent on that, but
> the point is that wouldn't be enough to get texture working, I think.

It should at least be enough for our unit tests, but yeah, if there's 
shaders that you've encountered that do texture access from within CF, I 
guess we'll need more sophistication.

Fortunately it's also something that's relatively easy to do one step at 
a time.

> 
>>> Personally I would have loved to meet comments like mine in other
>>> areas of Wine: knowing what data read or written by an algorithm are
>>> meant to represent makes it much easier to understand why the
>>> algorithm does what it does. I agree that this is not the most
>>> complicated case around, but I thought this would be helpful for
>>> someone not already knowing my code. OTOH I understand that also a
>>> high-level description is useful: would you consider it satisfying to
>>> have both?
>>
>> Eh, maybe. I would at least start with the high-level explanation. When
>> I read the above comment my eyes kind of glaze over, and it doesn't tell
>> me anything that the code already doesn't (even the struct definitions
>> alone tell me as much without checking how they're used.)
>>
>> I dunno, I'll see what you end up coming up with.
> 
> Ok, I'll do my best.
> 
>> Er, sorry, I should have said intptr_t. I mean to return intptr_t from
>> the rbtree compare callback, and then all you need is
>>
>>       return (intptr_t)key - (intptr_t) variable->var;
> 
> Would that really work? For one thing, signed overflow is undefined
> behavior in C (except possibly very recent versions which we're
> definitely not using). It doesn't seem we're using -fwrapv, does it?
> 
> Even assuming two's complement semantics, this doesn't seem to give an
> order: for example, 0 > (INT_MIN + 1), because 0 - (INT_MIN + 1) =
> INT_MAX > 0 and INT_MAX > 0, because INT_MAX - 0 = INT_MAX > 0, but
> (INT_MIN + 1) - INT_MAX = 2 > 0, therefore you'd have (INT_MIN + 1) >
> INT_MAX. Therefore the relationship is not transitive, which I am pretty
> sure the red-black tree implementation won't like.
> 
> Am I missing something?

Actually I think you're right, but I didn't realize because I've seen 
that idiom used elsewhere. Specifically, compare_register_range() and 
compare_descriptor_range() in libs/vkd3d/state.c are definitely broken, 
and we have a few other comparison functions which are only not broken 
by virtue of implicit bounds.

Unless we're both missing something...

> 
>> Ah. I assumed that you would actually be indexing right before copyprop,
>> and didn't check whether that was actually the case.
>>
>> Although on reflection now I'm not sure if we *should* index. We only
>> dump the IR once, at the end, and I feel like we risk confusion by
>> printing instruction indices that won't match. Hmm, maybe the code makes
>> sense as it is...
> 
> You're right, I'd say at this point it's better to just ignore the index
> and always use the pointer, so there is no confusion with the indices in
> the final dump. This also means that I can put the %p directly in the
> TRACE without the additional buffer that you didn't like (and removing
> like 10 lines of code used just for tracing).
> 
>> I'm typically in favor of leaving in traces if they were helpful during
>> debugging.
>>
>> For what it's worth, I'd also be in favor of a "debugstr_node" helper or
>> something. Maybe even if it uses a fixed-size buffer, although I'm not
>> sure that Matteo or Henri will agree with that ;-)
> 
> If you accept my solution above, this has just been swept away.

Eh, I guess. The nice thing about printing indices is that you can just 
add an hlsl_dump_function() call right before the relevant function, and 
then it's a lot easier to see what the nodes are (whereas currently we 
basically never print node pointers).

I'm not sure we want that call to be in by default, but if someone needs 
to debug copyprop it'd be very easy to add.

> 
>>>> Shouldn't this function just return void?
>>>
>>> It's a matter of philosophy. If you ask me, the switch in
>>> copy_propagation_recursive should be rather oblivious of what the
>>> various helpers actually do, it shouldn't have an insight that
>>> copy_propagation_store never returns true. For its point of view, any
>>> helper can potentially modify the code. Though I can change to void if
>>> you prefer.
>>
>> It's a fair point, but I guess I look at copy_propagation_recursive()
>> and think "recording stores should never make any progress; all the
>> magic happens in rewriting loads". This becomes especially true if this
>> function gets renamed to copy_prop_record_store() or something, which I
>> think it should.
> 
> Ok, you win on this one.
> 
> Thanks again, Giovanni.
>