wineport: Add support for ctz().
Adam Martinson
amartinson at codeweavers.com
Thu Mar 17 12:24:51 CDT 2011
On 03/17/2011 02:54 AM, Marcus Meissner wrote:
> On Wed, Mar 16, 2011 at 01:26:31PM -0500, Adam Martinson wrote:
>> On 03/16/2011 08:34 AM, Alexandre Julliard wrote:
>>> Adam Martinson<amartinson at codeweavers.com> writes:
>>>
>>>> @@ -239,6 +243,19 @@ extern int getopt_long_only (int ___argc, char *const *___argv,
>>>> int ffs( int x );
>>>> #endif
>>>>
>>>> +#if defined(__GNUC__)&& (GCC_VERSION>= 30406)
>>>> + #define ctz(x) __builtin_ctz(x)
>>>> +#elif defined(__GNUC__)&& (defined(__i386__) || defined(__x86_64__))
>>>> + static inline int ctz( unsigned int x )
>>>> + {
>>>> + int ret;
>>>> + __asm__("bsfl %1, %0" : "=r" (ret) : "r" (x));
>>>> + return ret;
>>>> + }
>>>> +#else
>>>> + #define ctz(x) (ffs(x)-1)
>>>> +#endif
>>> There's no reason to add this. Just use ffs().
>>>
>> If I thought ffs() was adequate, I would. I need this for iterating
>> sparse bitsets.
>>
>> __builtin_ctz() compiles to:
>> mov 0x8(%ebp),%eax
>> bsf %eax,%eax
>>
>> (ffs()-1) compiles to:
>> mov $0xffffffff,%edx
>> bsf 0x8(%ebp),%eax
>> cmove %edx,%eax
>> add $0x1,%eax
>> sub $0x1,%eax
>>
>> ... Fortunately -O2 catches the add/sub. So yes, there is a reason,
>> ctz() is at least 50% faster.
> You are optimizing in the wrong spot.
>
> If this is not in performance relevant code, readability is always better
> than hacks.
>
> ciao, Marcus
I agree 100% with the 2nd part; this is for some of the functions that
top the CPU time charts in wined3d.
More information about the wine-devel
mailing list