[PATCH 3/3] winex11: Use TINN algorithm to speed up colour lookups. (try 2)

Wed May 9 07:36:11 CDT 2007

> But I'm open to any ideas you may have as to how we could avoid using
> floats, yet not run into the overflow situations so easily. We could
> probably use division somewhere but I don't think that's actually any
> better performance-wise.

It is, when implemented via bit-shifts (very cheap).  I know wine is
written in pure C, but with a little C++ template magic you should be
able to have fixed-point math that acts like floating, only having the
compiler take care of the binary point at compile-time using the type
system, and emitting only integer operations.  Still wouldn't be as
efficient as hand-designed fixed-point pumped through the optimizing
compiler.

I'm pretty sure ints are still much better performance-wise.  That
article you cited is not telling the whole story; no operation on a
modern CPU is executed in a single cycle.

The aggregate throughput when considering optimal pipelining may be
one float instruction per cycle... but integer operations routinely
achieve several (on a Core 2 I believe up to four instructions can be
started into the pipelines per cycle, per core).  There's also
pipeline stalls and flushes to consider, where I believe further
investigation will again show much bigger performance hits on
floating-point.  And no mention of SIMD was made whatsoever, which is
capable of the highest performance in every discussion I've seen.
Furthermore SIMD has built-in support for things like
integer-multiply-keeping-only-the-most-significant-half and
multiply-add, at least on x86 architecture, which if I'm not mistaken
is still a wine requirement for running x86 PE files.

And when you try to save space by using 32-bit float, you lose
everywhere.  32-bit floating-point store involves rounding and is
slow.  As a result, compilers maintain high precision as long as
possible, which means you lose repeatability (integer expressions tend
to give the same results with or without optimizations, float -- do
not).

If you go to 64-bit floats, then you should compare against the range
of 64-bit integers, which modern CPUs also support at full speed for
basic operations.