HLSL offsetting

Francisco Casas fcasas at codeweavers.com
Mon Jun 13 10:59:42 CDT 2022


Hello,

>> Functions that cannot receive structs or arrays may receive a 
>> "flattened" component offset that can then be translated into a route, 
>> other functions would require the route as an array.
> 
> Not immediately sure what functions you're thinking of, but I imagine 
> things like hlsl_compute_component_offset() would now have to translate 
> the offset into an array.
> 

For instance, hlsl_new_load(), hlsl_new_resource_load(), and 
hlsl_new_store() will have to receive routes instead of single nodes as 
offsets now.

>> At this point I can see the benefits of (b) over (a), but also, 
>> several complications that may arise (you have pointed most of them):
>> - We will have to translate many things that are already in terms of 
>> register offsets into component offsets.
> 
> How many things, though? As far as I can see it's:
> 
> - copy-prop
> 
> - copy splitting
> 
> - hlsl_offset_from_deref() [which is part of the point of the whole 
> exercise]
> 
> That's pretty much it.
> 

Well, maybe it is not too much, but we also have to add initializers to 
the list and everywhere else the previous 3 functions are called.

>> - Once we start supporting non-constant offsets, we may also want to 
>> introduce a common subexpression elimination pass for the registers 
>> offsets expressions that will arise (currently, the creation of common 
>> expressions is mainly avoided by the recursive structure of the split 
>> passes).
> 
> What cases are you thinking of that would want CSE?
> 

Matteo pointed it out. Basically if we have several copies that come 
from a deep struct, let's say 4-dimension array:

float4 arr[10][10][10][10];

Consider the following 2 loads:

arr[i][j][k][0]
arr[i][j][k][1]

The idea is that the common part of the SMx-specific register offsets 
(4000 * i + 400 * j + 40 * k)
is shared among both and not computed twice.

>> Solely because I have spent a considerable amount of time implementing 
>> option (a) (and some time giving up on implementing component offsets 
>> as a single hlsl_src, instead of a path) I am rushing (a) to see how 
>> the patch turns out in the end, before trying (b).
>>
>> I so think that (a) can be cleansed one step at the time. Even if the 
>> register sizes and offsets depend on the SM, we can write an interface 
>> to abstract the rest of the code of using them directly, and gradually 
>> migrate the code that does to use this interface instead.
>>
>>
>> But so far, yeah, I am being convinced that (b) is better, however 
>> more difficult.
>> If we do (b), I suggest we try to do it in separate steps, with some 
>> "scaffolding code":
>>
>> - Add the component offset route to hlsl_deref without deleting the 
>> register offset.
>> - Create a simple pass that initializes the register offset field 
>> using the route in all the hlsl_derefs.
>> - Translate all the parse code first to work with the routes, and 
>> apply the pass just after parsing.
>> - Translate some more compilation passes and move the translation pass 
>> forward.
>> - Repeat until all the passes that can be written in terms of 
>> component offsets are.
>> - Write the SMxIR·s and the SMxIR translations.
>> - Only then, remove the the register offset from hlsl_deref and the 
>> translation pass from the codebase.
>>
>>
>> Otherwise we may end up writing a very big patch that may take too 
>> long to complete (!).
> 
> Ech, I think that trying to do two things at once is going to be more 
> confusing than the alternative. I also don't think there's *that* much 
> code that we'd have to change, i.e. a monolithic patch wouldn't be that 
> bad.
> 

Okay, I stopped working on multiple register offsets and now I am 
working on this approach.

I still think that adding a pass to translate these "route"s to the 
original register "offset"s, so that we can implement this change 
gradually, is good. We can move the pass forward as we change more 
passes to work with component offsets. And we can debug more easily the 
things that we don't translate correctly.

I am aiming to implement the new approach until right after 
split_matrix_copies, and put the translation afterwards. So far it seems 
to be going nicely.

> For that matter, I'd be happy to try writing those patches myself :-)
> 

Give me a couple of days to finish a patch with the previous idea first. 
Even if we don't want to introduce the ugliness of this transitional 
state upstream, it may be a good starting point for the big patch.


> ἔρρωσθε,
> Zeb



More information about the wine-devel mailing list