[PATCH 2/3] d3dx9: Support relative addressing in preshader.

Thu Mar 23 18:39:01 CDT 2017

On 03/24/2017 01:57 AM, Matteo Bruni wrote:
> 2017-03-23 16:06 GMT+01:00 Paul Gofman <gofmanp at gmail.com>:
>> Signed-off-by: Paul Gofman <gofmanp at gmail.com>
>> ---
>>   dlls/d3dx9_36/preshader.c    | 103 ++++++++++++++++++++++++++++++++++++-------
>>   dlls/d3dx9_36/tests/effect.c |  38 ++++++++--------
>>   2 files changed, 106 insertions(+), 35 deletions(-)
>>
>> diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c
>> index 9aedc99..2974ffd 100644
>> --- a/dlls/d3dx9_36/preshader.c
>> +++ b/dlls/d3dx9_36/preshader.c
>> @@ -191,6 +191,8 @@ struct d3dx_pres_operand
>>       /* offset is component index, not register index, e. g.
>>          offset for component c3.y is 13 (3 * 4 + 1) */
>>       unsigned int offset;
>> +    enum pres_reg_tables base_reg_table;
>> +    unsigned int base_reg_offset;
>>   };
> It would be nicer to create a struct for storing together table and
> offset and to pass it around to the various functions taking both
> arguments, like parse_pres_reg().
> It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a 
floating constant preshader input. Maybe I just remove this variable and 
check with FIXME during parsing?
>
>> +    size = rs->table_sizes[table] * table_info[table].reg_component_count;
>> +    if (offset >= size)
>> +    {
>> +        if ((offset & 255) < size)
>> +            offset &= 255;
>> +        else
>> +            offset = offset % size;
>> +    }
> Is this just to try and avoid the modulo operation in some cases or is
> there a deeper reason?
> If the former, you could extend that to all the power-of-two sizes by
> first checking if that's the case (!(size & size - 1)) and, if so,
> and-ing offset with the mask (size - 1). That's assuming the compiler
> doesn't do the same optimization already.
>
> Let me know if there is some other reason instead.
Just doing (offset % size) will lead to different results and will break 
the tests. The logic in this out of range index checking is a bit mind 
breaking.
>
>>   static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp)
>>   {
>> -    if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) / table_info[opr->table].reg_component_count))
>> -        WARN("Using uninitialized input, table %u, offset %u.\n", opr->table, opr->offset + comp);
>> +    unsigned int base_offset, index;
>> +
>> +    if (opr->base_reg_table == PRES_REGTAB_COUNT)
>> +        index = 0;
>> +    else
>> +        index = (int)exec_get_reg_value(rs, opr->base_reg_table, opr->base_reg_offset);
> I think it's better with lrint() in place of the cast.
I just want to note that whenever the input is from floating values FX 
compiler generates preshader instructions which do the rounding, and 
skips that if the base register comes from integers. Do you think we 
need to do extra rounding in this case?
>
>> +    if (index >= rs->table_sizes[opr->table] && opr->table == PRES_REGTAB_CONST)
>> +    {
>> +        unsigned int size_pow2;
>> +
>> +        for (size_pow2 = 1; size_pow2 < rs->table_sizes[opr->table]; size_pow2 <<= 1)
>> +            ;
>> +        index &= (size_pow2 - 1);
>> +        if (index >= rs->table_sizes[opr->table])
>> +            return 0.0;
>> +    }
> I don't think we necessarily have to follow native's madness on buffer
> overflows to the letter. Since you did, that's certainly fine though.
>
     I think I finally guessed the principle how native d3dx clamps 
indexes, at least finally I got stable match for the tests involving any 
index numbers. I thought it could make some sense as we sometimes see 
apps depending on out of bounds access behaviour, especially if native 
implementation can return zeroes (either stable or by chance like here).

     As I can guess the logic behind it, it has some allocation size on 
the floating constant table which is the power of 2, and it first 
ultimately puts the index to this allocation range with modulo 
operation. Then, if the index is bigger than the actual number of 
constants, it returns zero. This logic is a bit different for indexing 
immediate constants though (which is used when indirect addressing 
selects something from vector or matrix), which never returns ultimate 
zeroes, probably because the memory allocation for immediate constants 
is always exact and is not done in chunks. I have no clear idea though 
why (offset & 255) < size is a special case, can only guess than maybe 
it is related somehow to casting offset to byte at some point.