HLSL offsetting

Mon Jun 13 13:32:46 CDT 2022

On 6/13/22 10:59, Francisco Casas wrote:
> Hello,
> 
>>> Functions that cannot receive structs or arrays may receive a 
>>> "flattened" component offset that can then be translated into a 
>>> route, other functions would require the route as an array.
>>
>> Not immediately sure what functions you're thinking of, but I imagine 
>> things like hlsl_compute_component_offset() would now have to 
>> translate the offset into an array.
>>
> 
> For instance, hlsl_new_load(), hlsl_new_resource_load(), and 
> hlsl_new_store() will have to receive routes instead of single nodes as 
> offsets now.

They'll have to be adjusted, although there's probably simpler ways than 
to pass arrays to them directly. It'll take writing the patch probably.

>>> At this point I can see the benefits of (b) over (a), but also, 
>>> several complications that may arise (you have pointed most of them):
>>> - We will have to translate many things that are already in terms of 
>>> register offsets into component offsets.
>>
>> How many things, though? As far as I can see it's:
>>
>> - copy-prop
>>
>> - copy splitting
>>
>> - hlsl_offset_from_deref() [which is part of the point of the whole 
>> exercise]
>>
>> That's pretty much it.
>>
> 
> Well, maybe it is not too much, but we also have to add initializers to 
> the list and everywhere else the previous 3 functions are called.
> 
>>> - Once we start supporting non-constant offsets, we may also want to 
>>> introduce a common subexpression elimination pass for the registers 
>>> offsets expressions that will arise (currently, the creation of 
>>> common expressions is mainly avoided by the recursive structure of 
>>> the split passes).
>>
>> What cases are you thinking of that would want CSE?
>>
> 
> Matteo pointed it out. Basically if we have several copies that come 
> from a deep struct, let's say 4-dimension array:
> 
> float4 arr[10][10][10][10];
> 
> Consider the following 2 loads:
> 
> arr[i][j][k][0]
> arr[i][j][k][1]
> 
> The idea is that the common part of the SMx-specific register offsets 
> (4000 * i + 400 * j + 40 * k)
> is shared among both and not computed twice.

Hmm, interesting.

That would be avoidable if we had e.g. HLSL_IR_LOAD instructions, but 
that still brings us back to the problem where we have unused loads we 
can't DCE.

My inclination is that yeah, if we end up caring about this we should 
just do a late CSE pass.