A TestBot results analysis [Re: [PATCH 2/2] kernel32: Match the Windows 10 1709+ GetConsoleFontInfo().]

Wed Nov 6 21:20:05 CST 2019

tl;dr; Failures not caused by my patch!

First the reason why the TestBot reran all of the kernel32 tests on Wine 
is that I patched Wine's kernel32 dll. The TestBot does not really 
understand what the patch does so it assume that could break any of the 
dll's tests.

In fact a change in kernel32 could break pretty much any Wine test so to 
be thorough all the tests should be rerun. But while technically the 
TestBot has been able to do this for over a year, rerunning all the 
tests would take a lot of time (requiring more hardware resources), and, 
more importantly, as this case shows the quality of the Wine tests is 
just too poor for this to be workable.

Marvin wrote:
> === debian10 (32 bit report) ===
[...]
> debugger: Timeout

I'm starting the analysis with this timeout. It's intermittent. The 
exception that preceded the timeout is pretty frequent, but the timeout 
happens less frequently. The last time it happened in this VM 
configuration was on October 31st: 
https://test.winehq.org/data/ccec532879ec14b2e79da08288152a69221ec4d1/linux_newtb-debian10-win32/ke$

It is to avoid these false positives I'm working on bug 47998.
https://bugs.winehq.org/show_bug.cgi?id=47998

[...]
> kernel32:
> change.c:320: Test failed: should be ready

This failure too is intermittent:
https://test.winehq.org/data/tests/kernel32:change.html

But it's much rarer. So much so that it never happened in this VM 
configuration, i.e. debian10 + win32 build + English, in the last 29 
WineTest runs on record.

It did happen four times in other configurations though:
Oct 31 - debian10 + win32 build + French
Oct 27 - debian10 + win32 build + Japanese
Oct 22 - debian10 + wow64 build + English
Sep 25 - debian10 + win32 build + Japanese

Based on the test it does not look like the language would have an 
impact on this test so this failure could really happen at any time in 
any kernel32:change run.

But the TestBot cannot know that and, since this failure did not happen 
in this specific configuration in the history available to the TestBot, 
a fix for bug 47998 would not help with this false positive.

Sometimes fixing an intermittent failure is the only solution for 
cleaner test runs.

> === debian10 (64 bit WoW report) ===
> 
> kernel32:
> debugger.c:320: Test failed: GetThreadContext failed: 5
> debugger.c:320: Test failed: GetThreadContext failed: 5

This might be confusing. The TestBot used the November 5 WineTest run as 
a reference which did have two of these failures. So why is it reporting 
these two failures as new?

That's because this run has 4 such failures in total. So the TestBot 
ignores the first two but correctly reports the next two as new.

In fact this failure happens a variable number of times because it's 
called for each of the process' threads and fails randomly.
https://source.winehq.org/git/wine.git/blob/5b62f89baa82daecd430897de0bb5cab32c5fc5e:/dlls/kernel32/tests/debugger.c#l320

On Oct 28 it happened 5 times so a fix for bug 47998 would have avoided 
this particular false positive. But there's always the risk of having 
one more failure than the historical record.

Should the TestBot deduplicate the failure lines to avoid such issues? (*)

Duplicate failures (identical line number and failure) typically happen 
when a test has an array containing test information and loops on it. In 
such a case a patch adding a new entry that causes an extra failure 
should be flagged which would not happen if the TestBot is deduplicating 
the failures.

But such loops should contain the index of the tested array entry in 
their message which would thwart the deduplication.

The kernel32:debugger test is a special case in that there does not seem 
to be a meaningful way to differentiate the failures so it's a good case 
for deduplicating.

(*) If it is to be implemented the deduplication should be performed 
    while extrating the failures from the test report. That's because 
    once the failures have been extracted, any trace that we present 
    between them has been lost. But such traces provide context and thus 
    should prevent the deduplication from happening. This is also a 
    place where line numbers can be taken into account (they are ignored 
    when comparing failures since patches will change them).

-- 
Francois Gouget <fgouget at codeweavers.com>