[Bug 50898] New: Improve performance for RGB lookups into color tables conversion

WineHQ Bugzilla wine-bugs at winehq.org
Wed Mar 31 07:32:09 CDT 2021


https://bugs.winehq.org/show_bug.cgi?id=50898

            Bug ID: 50898
           Summary: Improve performance for RGB lookups into color tables
                    conversion
           Product: Wine
           Version: 6.5
          Hardware: x86-64
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: gdi32
          Assignee: wine-bugs at winehq.org
          Reporter: gabrielopcode at gmail.com
      Distribution: ---

Created attachment 69710
  --> https://bugs.winehq.org/attachment.cgi?id=69710
Generate a lookup cache for RGB into color table conversion

This improves the performance of color conversion from RGB to palette
significantly by generating a lookup cache for 5-bit-per-color RGB values
(centered) into the palette, before doing the conversion, for older
applications and games. The comments in the patch should describe the algorithm
I came up with for large color tables.

Note that Windows also quantizes the lookup to 5-bit-per-color already, so it
very likely does something similar (but seems to do more caching based on my
tests); we have this quantization in wine's code to match Windows, but we don't
take advantage of it, so it's wasted.

Generating it by looking through the color table for each entry in the lookup
cache wouldn't be a good idea if the color table is large. The lookup cache
itself is 32768 entries, which turns out to be the equivalent of a 256x128
image. The current algorithm used by wine works like that already: it goes
through each pixel in the image, and for each pixel, it looks up every color in
the color table, which is extremely slow. Generating the cache like that would
mean images have to be at least 256x128 for it to have any benefit, otherwise
it would be a performance regression instead, and that's not very reasonable to
expect of older applications, and it wouldn't provide much.

So, the table generation algorithm is much more efficient than that for large
amount of colors in the table. In my synthetic tests generating 4096 tables
with various color tables of 256 colors, it turned out to be at least 4x
faster, sometimes even 4.5x depending on color table (even a purely random
color table), for example a random table taking about 14406ms instead of
62056ms on my CPU.

In fact, the algorithm is not far from being constant in terms of amount of
colors in the table, taking only 50% more time from 16 colors to 256. So
basically increasing the amount of colors from 16 to 256 doesn't make that much
difference, since each color usually fills its own part of the lookup cache,
which is not filled again (unless distance is shorter due to quantization, but
that's only for one generation anyway). In contrast, going through each color
in the table for each entry (the original algorithm) would massively increase
the amount of time spent (from less than 10 seconds, up to more than a minute,
or 6x slower).

For a few benchmark examples: a 256-color table with the algorithm takes about
14000ms to do 4096 iterations on my CPU, a 40-color table takes about 12000ms,
so it's only 17% slower with this algorithm. The 40 colors seems to be the
cutoff on my CPU where the algorithms start to differ in performance, with the
straight one being faster at lower amount of colors, presumably due to better
cache locality and branch prediction.


A fast and easy way to "see this in action" is to download Winamp, then launch
it and right-click on the top, click on the "Nullsoft Winamp..." menu item and
then go to the Credits tab. Depending on your CPU you can clearly see the FPS
difference and/or CPU usage for one core. (Note that it still has some fps
drops, this is normal and happens on Windows as well, even when the CPU core is
not maxed out, it's inherent in the app)

http://web.archive.org/web/20180902190237/http://winamparchive.org/dl/winamp5666_full_all-build3516.exe

Further improvements would require caching it somehow between calls, but I
believe that requires either more fragile changes to gdi32 code, or semi-hacks
to deal with a typical applications' expectations, so this should be a good
start for now.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.



More information about the wine-bugs mailing list