[PATCH vkd3d v2 03/10] vkd3d-shader/hlsl: Set component count for objects to 1. --- Francisco, I left a test marked as todo because it still fails on master, because of missing object copy.

Thu May 12 17:49:28 CDT 2022

On 5/10/22 08:31, Giovanni Mascellani wrote:
> Hi,
> 
> sorry, I made a mess with the commit message and some of it remained in
> the subject line.
> 
> At any rate, I think this patch still requires some thought: changing
> hlsl_type_component_count(), while making sense in itself, impacts
> hlsl_compute_component_offset().
> 
> More in general, I think there are two different concepts that are still
> a bit too confused in the code:
>    * the "logical" component count, i.e., how many things you have to put
> into an initializer;
>    * the "physical" register count, i.e., how many register that type
> will require.
> 
> Most of base types take one component and one register, which is
> probably why we're confusing them right now. But objects don't (they
> take a component, but no registers, because they are pure entities, they
> don't have a value). I think doubles wouldn't either if we supported
> them (they take one component, but two registers).
> 
> Also, components are always packed (as far as I know), while registers
> have padding we have to take into account (for example, a matrix row is
> padded to a whole four registers; same for an array item; but these
> rules depend on the SM).
> 
> Therefore I think we should distinguish more clearly which of these two
> concept is used each time, and be sure that we're computing correctly
> both offsets and sizes based on either of them.
> 
> Consider for example:
> 
> struct foo
> {
>       Texture2D t;
>       float x;
> };
> 
> With this patch the field x would go to register offset 1, which doesn't
> look correct.
> 
> BTW, I just made up the words "component" and "register", and I think
> they're wrong already, because I guess that a register is technically
> four things of what I called "register" in this email. Maybe HLSL/DXBC
> gurus have better word proposals.
> 

Yes, quite. Without trying to hold things up, it's probably past time we 
fixed this in a better way. Adding hlsl_fixmes for object arrays is 
feasible but has proven somewhat awkward.

The above summary is about accurate, although I will nitpick one 
thing—objects *do* have registers; they're just allocated in a different 
namespace and according to different rules.

There was some private discussion between myself and Matteo about this, 
but I think the correct thing to do is to deal in HLSL *only* in terms 
of components.

In fact, I would go so far as to say that reg_size and reg_offset should 
go away (or perhaps be accessed only from hlsl_sm4.c and hlsl_sm1.c—but 
better just to invent a new helper to calculate them from scratch, and 
stop storing them in the hlsl_type structure, I think).

I would also go so far as to say that hlsl.c should stop caring about 
matrix majority, and just use the same component ordering for both 
row-major and column-major matrices.

There are a few interesting consequences here:

* Doing the translation from "component-based" offsets to 
"register-based" offsets (i.e. getting the alignment and namespaces 
right) is actually relatively trivial; as Francisco pointed out, we 
basically just need to use hlsl_compute_component_offset() in 
hlsl_offset_from_deref().

* One thing we should do, though, is make a point of watching for 
non-contiguous loads in hlsl_offset_from_deref(). Because of field 
alignment, I don't think we'll ever generate any, but that fact doesn't 
inhere in the IR, and accordingly I'd rather not even assert() that it's 
the case. (In fact, if we ignore matrix majority, it may end up being 
quite easy to "accidentally" generate non-contiguous loads.)

* in order to generate less terrible code, we are going to want to do 
vectorization passes, and probably other passes as well, on sm1/sm4 IR 
(henceforth "LLIR"). This doesn't necessarily need to happen yet, since 
we can generate correct code without such optimizations, but it is 
another reason that introducing a "proper" LLIR will need to happen, as 
has been talked about often.