A TestBot results analysis [Re: [PATCH 2/2] kernel32: Match the Windows 10 1709+ GetConsoleFontInfo().]
Francois Gouget
fgouget at codeweavers.com
Wed Nov 6 21:20:05 CST 2019
tl;dr; Failures not caused by my patch!
First the reason why the TestBot reran all of the kernel32 tests on Wine
is that I patched Wine's kernel32 dll. The TestBot does not really
understand what the patch does so it assume that could break any of the
dll's tests.
In fact a change in kernel32 could break pretty much any Wine test so to
be thorough all the tests should be rerun. But while technically the
TestBot has been able to do this for over a year, rerunning all the
tests would take a lot of time (requiring more hardware resources), and,
more importantly, as this case shows the quality of the Wine tests is
just too poor for this to be workable.
Marvin wrote:
> === debian10 (32 bit report) ===
[...]
> debugger: Timeout
I'm starting the analysis with this timeout. It's intermittent. The
exception that preceded the timeout is pretty frequent, but the timeout
happens less frequently. The last time it happened in this VM
configuration was on October 31st:
https://test.winehq.org/data/ccec532879ec14b2e79da08288152a69221ec4d1/linux_newtb-debian10-win32/ke$
It is to avoid these false positives I'm working on bug 47998.
https://bugs.winehq.org/show_bug.cgi?id=47998
[...]
> kernel32:
> change.c:320: Test failed: should be ready
This failure too is intermittent:
https://test.winehq.org/data/tests/kernel32:change.html
But it's much rarer. So much so that it never happened in this VM
configuration, i.e. debian10 + win32 build + English, in the last 29
WineTest runs on record.
It did happen four times in other configurations though:
Oct 31 - debian10 + win32 build + French
Oct 27 - debian10 + win32 build + Japanese
Oct 22 - debian10 + wow64 build + English
Sep 25 - debian10 + win32 build + Japanese
Based on the test it does not look like the language would have an
impact on this test so this failure could really happen at any time in
any kernel32:change run.
But the TestBot cannot know that and, since this failure did not happen
in this specific configuration in the history available to the TestBot,
a fix for bug 47998 would not help with this false positive.
Sometimes fixing an intermittent failure is the only solution for
cleaner test runs.
> === debian10 (64 bit WoW report) ===
>
> kernel32:
> debugger.c:320: Test failed: GetThreadContext failed: 5
> debugger.c:320: Test failed: GetThreadContext failed: 5
This might be confusing. The TestBot used the November 5 WineTest run as
a reference which did have two of these failures. So why is it reporting
these two failures as new?
That's because this run has 4 such failures in total. So the TestBot
ignores the first two but correctly reports the next two as new.
In fact this failure happens a variable number of times because it's
called for each of the process' threads and fails randomly.
https://source.winehq.org/git/wine.git/blob/5b62f89baa82daecd430897de0bb5cab32c5fc5e:/dlls/kernel32/tests/debugger.c#l320
On Oct 28 it happened 5 times so a fix for bug 47998 would have avoided
this particular false positive. But there's always the risk of having
one more failure than the historical record.
Should the TestBot deduplicate the failure lines to avoid such issues? (*)
Duplicate failures (identical line number and failure) typically happen
when a test has an array containing test information and loops on it. In
such a case a patch adding a new entry that causes an extra failure
should be flagged which would not happen if the TestBot is deduplicating
the failures.
But such loops should contain the index of the tested array entry in
their message which would thwart the deduplication.
The kernel32:debugger test is a special case in that there does not seem
to be a meaningful way to differentiate the failures so it's a good case
for deduplicating.
(*) If it is to be implemented the deduplication should be performed
while extrating the failures from the test report. That's because
once the failures have been extracted, any trace that we present
between them has been lost. But such traces provide context and thus
should prevent the deduplication from happening. This is also a
place where line numbers can be taken into account (they are ignored
when comparing failures since patches will change them).
--
Francois Gouget <fgouget at codeweavers.com>
More information about the wine-devel
mailing list