[PATCH] d3dx9: Improve performance and memory usage in preshader constants setting.

Tue May 31 16:48:29 CDT 2016

2016-05-31 11:48 GMT+02:00 Paul Gofman <gofmanp at gmail.com>:
> On 05/31/2016 12:24 AM, Matteo Bruni wrote:
>> This seems overly complicated and fragile.
>> Instead of this kind of "5 instructions bytecode" isn't it easier to
>> just store and then lookup a struct with all the info you need?
> Keeping a struct with offsets & count altogether is of course more
> straightforward, but I intentionally did it this way to save a lot of
> extra space and to introduce an easier coalescing. When there is a
> matrix transpose involved (which happens quite often in real apps as far
> as I have seen) the whole structure would be duplicated for every matrix
> element. In the current approach each element for transposed matrix has
> just one offset (destination offset changes sequentially and does not
> come to the array for each matrix element) and one count of 1. The other
> way to optimize this is to introduce a separate operation for transposed
> matrix copy, but will it really be simpler or nicer than it is now?
> Current approach is indeed a sort of encoding the data copy into a very
> simple bytecode. Keeping the whole struct and optimizing as a separate
> op for matrix transpose would be the same in principle, but I am afraid
> will result in a bit more code and a bit more mess as there will be more
> cases to handle. I thought the current approach is simpler as the data
> copy is encoded in a generic and simple way so on the actual copy I do
> not have to care for any specifics of initial parameter & constant
> layout, as well as in process of extending the array & coalescing.

I don't see why just moving the part of set_constants() which
generates the set_state structures for each "copy" to init time would
be more complicated than the current solution. It might take some more
memory (not a whole lot more, AFAICS) but other than that it seems
strictly nicer to me. Maybe it's a case of "beauty is in the eye of
the beholder", I don't know...
The issue with transposed matrices seems mostly orthogonal to the
encoding of the constant uploads and I agree it's important. What
about simply adding a transpose flag and the stride of the matrix and
encode it all in one struct (or something like a
COPY_STATE_TYPE_COPY_TRANSPOSED op for your approach)? It seems it
should be good from a practical standpoint and it should reduce the
number of operations (encoded either way) you have to store by a lot.

>> I
>> guess the end result would be something akin to storing an array of
>> struct d3dx_const_copy_state in place of struct d3dx_const_param_set.
>> I don't know how the code would exactly look like but I would start by
>> moving building all the struct d3dx_const_copy_param to init time.
> I am not sure I understand what you mean here, could you please clarify?
> In the variant currently suggested the whole building of structure
> responsible for parameter copy is already done solely at init time.
> Shader/preshader full constant structure is not stored anymore. There is
> still an array of parameters and constant description left which is
> currently used in effect.c code for setting sampler states. This could
> easily be optimized further by not storing the parameters other than
> samplers and storing just register index instead of the whole constant
> descs for them, but this looked as a separate step to me not fully
> related to this patch. Or probably I am misunderstanding your point here.

Yeah, that can and should certainly be a separate patch.
What I mean is storing the structs d3dx_const_copy_state you compute
in set_constants() (as they are right before actually setting the
values) in struct d3dx_const_tab in place of the structs
d3dx_const_param_set you use to generate them.
Does it make sense?

>> BTW
>> the coalescing you currently do in add_const_set() could be
>> potentially done as a separate pass after you generate all the
>> "d3dx_const_copy_state-like" structures (but still at init time),
>> which might be simpler.
> It can be done as a separate pass, but why? In my understanding it will
> result just in bigger intermediate array and a few more lines of code. I
> will need to do all the same but not at once when getting an "copy
> request" but from scanning the pre-stored array. Or am I missing something?

No, that's right. It is just an alternative option which I mentioned
because it might be simpler or more powerful (e.g. it makes possible
to coalesce multiple copies which are not next to each other - it
probably wouldn't help much at the moment since you can't coalesce
accesses from multiple parameters) in some case. Feel free to take or
ignore it.