wineport: Add support for ctz().
amartinson at codeweavers.com
Thu Mar 17 12:20:53 CDT 2011
On 03/17/2011 03:23 AM, David Laight wrote:
> On Wed, Mar 16, 2011 at 01:26:31PM -0500, Adam Martinson wrote:
>> __builtin_ctz() compiles to:
>> mov 0x8(%ebp),%eax
>> bsf %eax,%eax
>> (ffs()-1) compiles to:
>> mov $0xffffffff,%edx
>> bsf 0x8(%ebp),%eax
>> cmove %edx,%eax
>> So yes, there is a reason, ctz() is at least 50% faster.
> I'm not where you get 50% from!
> I've read both the intel and amd x86 instruction performance manuals
> (but can't clain to remember all of it!).
> The 'bsf' will be a slow instruction (with constraints on where it
> exectutes, and what can execute in parallel).
> The 'cmove' has even worse constraints since it can't execute until
> the 'flags' from the previous instruction are known.
> cmove is only slightly better than a mis-predicted branch!
> In this case there will be complete pipeline stall between the 'bsf'
> and the 'cmove'.
> ffs would probably execute faster with a forwards conditional branch
> (predicted not taken) in the 'return -1' path.
For my purposes the 'return -1' path will never be taken, so it's not
needed. I want to be able to do:
i = ctz(bitmap); /* get the LSB index */
bitmap ^= 1 << i; /* zero LSB */
/* for each set bit i... */
If there's a faster way to do this I'm all ears.
More information about the wine-devel