WineTest failure patterns preview

Mon May 3 11:58:32 CDT 2021

On Sun, 2 May 2021, Zebediah Figura (she/her) wrote:
[...]
> I guess if it were me, I'd use a fixed colormap of a small fixed number (16?
> I'm guessing there) of colors that are easy to distinguish, and then
> universally assign colors by (n % 16).

16 colors is not enough, particularly not if using a single palette for 
all the tests.

For instance the record holder is user32:clipboard with 81 different 
failure counts: 
https://test.winehq.org/data/patterns.html#user32:clipboard

So with a 16 color palette there would be a lot of wrapping and that 
would likely make the pattern unreadable.

Also note that even with the current scheme one can clearly see that the 
cw-rx460 machine has more failures than the other test configurations. 
Partly because it's almost the only machine present in that pattern, and 
partly because it has more yellow/red which are the colors of higher 
failure counts.

In contrast the non-English w10pro64 VMs have fewer failures (blue) and 
all the same color (and hence count). This suggests they have a 
different cause.
(for cw-rx460 it's the Radeon driver, I have not looked at w10pro64 yet)

user32:input is another case where the current color scheme works pretty 
well despite the high number of different failure counts (31).

https://test.winehq.org/data/patterns.html#user32:input

(And it shows something pretty bad happened on cw-gtx560-1909 around 
Apr 2nd. Now I just have to figure out what)

For reference, here are the 'high scores':
  81 user32:clipboard
  41 user32:win
  31 user32:input
  27 ole32:clipboard
  26 d3d11:d3d11
  25 user32:msg
  21 user32:sysparams
  20 d3d10core:d3d10core

> I'd also pick out those colors manually instead of trying to generate 
> them.

I'm fine with someone picking the colors manually but I'm not an artist 
and agonising over each color is not going to be a time saver for me.

> Yeah, you won't be able to distinguish between 1 failure and 17 
> failures, but hopefully that contrast won't come up very much.

Distinguishing between 1 and 17 failures is super important: it's the 
difference between catching a commit that introduces 16 news failures in 
the days after it's committed, and letting it slip through the cracks, 
only to be rediscovered months later when the author has vanished.

> Plus, that way, you could even learn a mental association, I guess,
> for whatever that's worth.

Precisely: what is it worth?
What advantage does being able to identify at a glance that two test 
units have the same number of failures gain us?

[...]
> Now, another thing that occurs to me that would be very useful, and which
> doesn't necessarily preclude any of the above but does sort of obviate its
> usefulness, is to generate a list of failures by line, or even by line +
> failure message.

Line numbers are useless for tracking failures: they change almost every 
time a test is modified. Matching on the message may work better, though 
some have 'random' content (pointers, etc). But fortunately they are 
relatively rare.

> I'd envision this as one per row, with "X" flags on each machine + day 
> that displays it.

Web pages are two-dimensional. So if rows are failure messages that only 
leaves columns to show both the reports and builds. That feels like one 
too many.

Or maybe instead of one box per test unit you meant to have one per 
failure message? That's likely going to be many boxes (there's already 
327 test units that had failures in the past 2 months!!!).

I had a possibly related idea for tracking individual failures but I'm 
not entirely sure it would work in practice: 
https://bugs.winehq.org/show_bug.cgi?id=48166

> Of course I'm sure you already have plenty of ideas on expanding the 
> page; I'm just throwing out one of my own here.

Not that many actually.
* Adding some sort of documentation.

* Adding links to potentially related Git commits.

* Adding a global pattern based on the number of failed test units.
  (that one also highlights the cw-rx460 issues pretty well)

* Adding links to related bugs. But ideally that would use Bugzilla's 
  rest API which is only available in Bugzilla >= 5.0 (WineHQ still runs 
  4.4.13, I don't know if it's worth upgrading Bugzilla just for this). 

That's about it.

-- 
Francois Gouget <fgouget at codeweavers.com>