wineport: Add support for ctz().

Adam Martinson amartinson at codeweavers.com
Thu Mar 17 12:24:51 CDT 2011


On 03/17/2011 02:54 AM, Marcus Meissner wrote:
> On Wed, Mar 16, 2011 at 01:26:31PM -0500, Adam Martinson wrote:
>> On 03/16/2011 08:34 AM, Alexandre Julliard wrote:
>>> Adam Martinson<amartinson at codeweavers.com>   writes:
>>>
>>>> @@ -239,6 +243,19 @@ extern int getopt_long_only (int ___argc, char *const *___argv,
>>>>   int ffs( int x );
>>>>   #endif
>>>>
>>>> +#if defined(__GNUC__)&&   (GCC_VERSION>= 30406)
>>>> +    #define ctz(x) __builtin_ctz(x)
>>>> +#elif defined(__GNUC__)&&   (defined(__i386__) || defined(__x86_64__))
>>>> +    static inline int ctz( unsigned int x )
>>>> +    {
>>>> +        int ret;
>>>> +        __asm__("bsfl %1, %0" : "=r" (ret) : "r" (x));
>>>> +        return ret;
>>>> +    }
>>>> +#else
>>>> +    #define ctz(x) (ffs(x)-1)
>>>> +#endif
>>> There's no reason to add this. Just use ffs().
>>>
>> If I thought ffs() was adequate, I would.  I need this for iterating
>> sparse bitsets.
>>
>> __builtin_ctz() compiles to:
>> mov    0x8(%ebp),%eax
>> bsf    %eax,%eax
>>
>> (ffs()-1) compiles to:
>> mov    $0xffffffff,%edx
>> bsf    0x8(%ebp),%eax
>> cmove  %edx,%eax
>> add    $0x1,%eax
>> sub    $0x1,%eax
>>
>> ... Fortunately -O2 catches the add/sub.  So yes, there is a reason,
>> ctz() is at least 50% faster.
> You are optimizing in the wrong spot.
>
> If this is not in performance relevant code, readability is always better
> than hacks.
>
> ciao, Marcus
I agree 100% with the 2nd part; this is for some of the functions that 
top the CPU time charts in wined3d.



More information about the wine-devel mailing list