DIB Engine & GSoC
Ge van Geldorp
ge at gse.nl
Fri Mar 2 15:39:25 CST 2007
> If I understand it correctly, there was an idea floating
> around that Ge van Geldorp had which was to auto generate the
> needed DIB engine code for certain color depths and functions
> so you would not have to implement the whole thing at once.
> if I remember correctly he did a proof of concept
> implementation for ReactOS by first creating the generic
> interfaces that were needed and then generating the code for
> the more simple color depths so as not to break all the
> existing hacks they had. Maybe I am off base here and Ge will
> comment as he lurks on wine-devel and I have cc'd him on this email.
This was specifically for BitBlt'ing. You have (potentially) 3 surfaces
involved in a BitBlt, the source surface, the destination surface and a
brush surface. For DIBs, the source and destination surface can be either 1,
4, 8, 16, 24 or 32 bits deep plus source and destination can have different
depths. The brush can be either a solid brush or a patterned brush. There
are 256 different ways to combine the surfaces (Raster Ops or ROPs).
All of these variables mean that although it's possible to write generic
code to handle everything that code is going to be littered with if's.
Meaning that generic code is going to be slow for simple operations like
filling a rectangle with a solid color. So you want to special-case the
most-often used cases and make them fast, while using the slow, generic code
as a fallback.
If you want to get the best performance, you need to write a lot of
almost-identical-but-slightly-different code. For example, in the innermost
loop you'll need to actually perform the ROP. But with 256 possible ROPs
that can be quite a number of if's to execute. And you're doing that inside
the innermost loop. To speed up things, I moved the ROP determination to the
outermost level. Based on the ROP one of 256 possible subroutines is called,
which in its innermost loop can just combine the bits in a way hardcoded for
that specific ROP (i.e. no more if's there's just e.g. "*Dest = *Src ^
*Dest" there). Actually, in the end I didn't use 256 subroutines but only
used subroutines for the most common ROPs (those with a symbol like PATCOPY)
and used a catch-all generic subroutine for the lesser used ROPs. All these
subroutines are almost the same, it's just the actual ROP code that's
different between them. And for some ROPs there's no source surface
involved, so for those ROPs you don't need to advance pointers into the
source surface when you're moving from row to row etc. (meaning you can't
just use a preprocessor macro, the changes between the subroutines are a bit
too complicated for that).
That's where the code generator came in. It generated all those slightly
different subroutines for the standard ROPs and a generic routine for the
rest. For an example, see http://oss.gse.nl/wine/dib8gen.c which contains
the generated bitblt routines for a 8-bit destination surface. Compare for
example the DIB_8BPP_BitBlt_Generic() routine near the top (the catch-all
one) with DIB_8BPP_BitBlt_PATCOPY() further down. The last one has very
tight inner loops (especially when BltInfo->PatternSurface is NULL, meaning
you're filling the destination rectangle with a solid color) compared to the
Of course, at the time I did measure performance to see if all this
optimization stuff indeed improved performance. And it did, dramatically
even. It's been a while, so I don't recall most of the performance numbers
anymore, but I do remember that I benchmarked some of the DIB BitBlt
operations and found that the generated code in ReactOS was about 3 times as
fast as the DIB code in Windows XP (on the same hardware of course).
I was actually quite proud of that code generator and the code it produced.
The DIB code generator is absolutely clean, MS doesn't ship anything like it
so it's simply impossible that it was created using reverse engineering. I
put the code generator under the LGPL, specifically so it could be used by
Wine if so desired. Note that the scope of this is limited to BitBlt's
though, it won't help when you need to draw a 3-pixel wide dash-dotted
ellips on your DIB surface...
Gé van Geldorp.
More information about the wine-devel