HLSL offsetting

Zebediah Figura zfigura at codeweavers.com
Thu Jun 9 14:17:04 CDT 2022


On 6/9/22 04:04, Matteo Bruni wrote:
>> The ugliness that we've run into is: how do we emit IR for the following
>> variable load?
>>
>>       struct apple
>>       {
>>           int a;
>>           struct
>>           {
>>               Texture2D b;
>>               int c;
>>           } s;
>>       } a;
>>
>>       /* in some expression */
>>       func(a.s);
>>
>> Unlike the SM1 example above, the register numbers don't match up.
>> Separately, it's kind of ugly that backend-specific details regarding
>> register size and alignment are leaking into the frontend so much.
> 
> I think most of that can be hidden or contained with some proper
> abstraction. And generous handwaving.
> But basically, that probably could be represented in the IR as copying
> around individual fields of the structure separately, rather than a
> single "struct deref". Clearly it can become more complex depending on
> the type of the variable but I think it should be doable.

Yeah, it could. Like I said it's not prohibitive. I'm just not sure it's 
the best option at this point.

It's worth pointing out that, at parse time, we want and need for load 
instructions (and therefore probably also store instructions) to have 
larger-than-vector types—that is, load instructions can produce structs, 
and store instructions can consume them. But we don't want that for 
SMxIR, and I believe we don't want that for the "final form" of HLSL IR 
either. That's the way the code is currently arranged and I see no 
reason not to keep it that way.

> 
>> Similarly, the amount of code that has to deal with matrix majority is
>> unfortunate.
> 
> That personally seems more annoying. Although it's not clear to me
> that handling matrix majority at a later stage is necessarily any
> better.

The main idea is that we could handle it something closer to once (well, 
once per backend), at HLSL -> SMx translation.

That doesn't necessarily mean requiring that all matrix loads and stores 
are done on a single scalar—after all, we could translate a single 
vector load to multiple MOV instructions if it can't actually be 
represented by one.

It does potentially mean doing vectorization passes on SMxIR, though. 
Hard to tell this far in advance, and it's also hard to tell if that's 
something we're going to need anyway.

> 
>> The former problem can potentially be solved by embedding multiple
>> register offsets into hlsl_deref (one per register type). Neither this
>> nor the latter problem are prohibitive, and I was at one point in favour
>> of continuing to use register offsets everywhere, but at this point my
>> feeling has changed, and I think using register offsets is looking more
>> ugly than the alternatives. I get the impression that Francisco
>> disagrees, though, which is why we should probably hash this out now.
> 
> As I mention below, I currently see two options as the most appealing.
> This one (multiple register offsets) sits somewhat in the middle and
> it feels like it would be best to go to one of the extremes instead.
> It's also possible that this middle ground solution would end up being
> nicer in practice. At any rate, I certainly wouldn't flat out discount
> it.
> 
>> Nor do I think we should use both register offsets and component offsets
>> (either in the same node type, or in different node types). That just
>> makes the IR way more complicated. Rather, I think we should be doing
>> everything in *just* component offsets until translation from HLSL IR to
>> SMx IR.
> 
> I touched on this earlier and I agree that the additional complexity
> is unlikely to be worth it. Admittedly we're in a limbo right now
> where SMxIR isn't quite there yet, which makes reasoning on some of
> these details a bit fuzzy.
> 
>> In order to deal with the problem of translating dynamic offsets from
>> components to registers, I see three options:
>>
>> (a) emit code at runtime, or do some sophisticated lowering,
>>
>> (b) use special offsetof and sizeof nodes,
>>
>> (c) introduce a structured deref type, much like [1]. Francisco was
>> actually proposing something like this, although with an array instead
>> of a recursive structure, which strikes me as an improvement.
>>
>> My guess is that (a) is very hard. I haven't really tried to reason it
>> out, though.
>>
>> Given a choice between (b) and (c), I'm more inclined to pick (c). It
>> makes the IR structure more restrictive, and those restrictions
>> fundamentally match the structured nature of the language we're working
>> with, both things I tend to like.
> 
> After giving it some thought I think that's certainly fine *for the
> higher level IR*. At the same time it seems to me that, if we go that
> route, eventually we also want to have real SMxIR with register
> offsets, and make sure that we can optimize constant offsets (thus
> expressions) at that level.
> 
> As I see it (as of current time and date, can't guarantee that I won't
> change my mind again...) we either push the backend-specific info up
> (register offsets all the way) or down (component offsets with
> structured deref / type info in the generic IR, transformation into
> register offsets in the SMxIR). I think either option works and it's
> mostly a matter of preference and which one fits / feels better with
> the rest of the compiler.

Yeah, that general approach makes sense to me. And yes, of course the 
SMxIR should deal entirely in register offsets.

My current vision of SMxIR is that it should be a one-to-one 
representation of actual instructions, writable without any lowering 
passes (and hence any passes that are done on it should be optimization 
only, with the *possible* exception of RA.) In a sense, it's what we 
have already with sm4_instruction and such, except that we'd be storing 
it and doing passes on it rather than just writing it directly.

Between those two extremes—well, what we currently have basically *is* 
the first extreme, with register offsets pushed all the way up to parse 
time. It's just causing some friction that makes me think the latter 
extreme is probably going to be pretty.

> 
>> Note that either way we're going to need specialized functions to
>> resolve deref offsets in one step. I also think that should depend on
>> the domain—e.g. for copy-prop we'll actually want to do everything in
>> component counts, but when translating to SMxIR we'll evaluate given the
>> register alignment constraints of the shader model. In the case of (b)
>> it's not going to be as simple as running the existing constant folding
>> pass, because we can't actually fold the sizeof/offsetof constants
>> (unless we dup the node list, evaluate, and then fold, which seems very
>> hairy and more work than the alternative).
> 
> Right, each option will have different tradeoffs WRT optimization
> passes. But e.g. copy-prop should be doable even with register
> offsets, we "just" need to make sure to always map the component
> offsets to their respective register offsets.

Quite, in fact we're already doing it that way. But it's probably better 
to work with components, since we (a) don't waste space tracking padding 
[not very important], and (b) don't have to deal with multiple register 
sets [more important].

> 
>> I invite thoughts—especially from Matteo, since we discussed this sort
>> of problem ages ago.
> 
> Yep, hope that my comments make sense. I want to hear from the others too.
> 
>>
>> ἔρρωσθε,
>> Zeb
>>
>>
>> [1] https://www.winehq.org/pipermail/wine-devel/2020-April/164399.html
>>
>> [2] https://www.winehq.org/pipermail/wine-devel/2020-April/165493.html
>>



More information about the wine-devel mailing list