WineTest failure patterns preview

Sun May 2 19:54:31 CDT 2021

On 5/2/21 7:07 PM, Francois Gouget wrote:
> On Sat, 1 May 2021, Zebediah Figura (she/her) wrote:
> [...]
>> Looks like a more sophisticated version of
>> <https://www.winehq.org/~jwhite/2deb8c2825af.html>, which is definitely a nice
>> resource when I'm trying to put effort into fixing test failures.
> 
> Right. I should probably have mentioned this bug which says Jer's
> page was part of the inspiration. But that page did not do what I
> need so I tweaked it.
> 
> https://bugs.winehq.org/show_bug.cgi?id=48164
> 
> Oh. And now the official pages are online and getting more feature
> complete.
> 
> https://test.winehq.org/data/patterns-tb-win.html
> https://test.winehq.org/data/patterns-tb-wine.html
> 
> 
>> I guess the tests are color-coded by number of failures, modulo some constant?
> 
> Right. Each failure type (timeout, crash, etc) has its own color. And
> then I use a gradient to attribute a color to each 'vanilla' failure
> count.
> 
> Note that what counts for allocating the colors is not the actual
> failure counts, but the number of different failure counts. That is a
> test with 4, 5 or 6 failures will get the same colors as one with 1, 2
> or 100 failures because in both cases there are only 3 different values.
> 
> I'll add a description of the patterns on the pages at some point.
> 
> 
>> I like the idea. I will note though that some of those colours seem
>> hard to tell apart, e.g. the shades of green in wine d3d9:device.
> 
> Yes. When a test unit has 30 different failure counts it's hard to find
> enough easy to distinguish colors. It's probably possible to do better
> by tweaking the colors the gradient goes through.
> 
> https://source.winehq.org/git/tools.git/blob/HEAD:/winetest/build-patterns#l617
> 
> The cyan-green-yellow part of the gradient produces colors that are not
> very easy to distinguish. The colors in the yellow-red part seem easier
> to identify but that gradient is given the same weight as the other two.
> I've experimented a bit with a darker cyan but going too dark does not
> look very nice.
> 
> 
>> Also I guess they aren't consistent across tests for some reason?
> 
> The goal is to maximize the contrast in the colors used by each pattern.
> But if I used a single 'color map' for all test units, I would need to
> allocate a hundred different colors. Then many test units with just a
> few failures would end up only using very similar colors.
> 
> Allocating one color map per test unit limits this issue to just a few
> patterns. And the best fix would be to reduce the number of failures in
> these tests ;-)
> 

I'll admit I don't fully follow your logic.

I guess if it were me, I'd use a fixed colormap of a small fixed number 
(16? I'm guessing there) of colors that are easy to distinguish, and 
then universally assign colors by (n % 16). I'd also pick out those 
colors manually instead of trying to generate them. Yeah, you won't be 
able to distinguish between 1 failure and 17 failures, but hopefully 
that contrast won't come up very much. Plus, that way, you could even 
learn a mental association, I guess, for whatever that's worth.

Or you could assign 1-16 to individual colors and anthing greater than 
16 to another color. Of course many tests have very large numbers of 
failures (usually repeated of course).

That's kind of splitting hairs of course.

Now, another thing that occurs to me that would be very useful, and 
which doesn't necessarily preclude any of the above but does sort of 
obviate its usefulness, is to generate a list of failures by line, or 
even by line + failure message. I'd envision this as one per row, with 
"X" flags on each machine + day that displays it. Of course I'm sure you 
already have plenty of ideas on expanding the page; I'm just throwing 
out one of my own here.