Faster TlsAlloc() or zero_bit_scan

David Laight david at
Fri Feb 11 11:03:15 CST 2005

On Thu, Feb 10, 2005 at 08:12:39PM +0000, Mike Hearn wrote:
> On Thu, 10 Feb 2005 18:59:21 +0100, Dietrich Teickner wrote:
> > I have a suggestion for a faster implementation of the zero_bit_scan in
> > RtlFindClearBits	[NTDLL.@]
> > (rlbitmap.c) for e.g. TlsAlloc()
> > The main is the usage of the instruction 'bsf  eax, eax'
> > 
> > This I have implemented in the new experimental odinxp-tree for finding 
> > the first zero_bit in the first 'bytecount' bytes of the bitmap addr.
> Does this actually make a noticeable difference? Rewriting stuff in
> assembly for theoretical performance improvements isn't so great, as far
> fewer people can read/write assembly than C.

I'd also add that you need to check that using 'bsf' is EVER a gain!
An i386 might execute it faster than the corresponding C, but there
is no guarantee that a P4i/Athlon will.
Oh, and you need to do any tests with the code out of the cache.


David Laight: david at

More information about the wine-devel mailing list