RFC How to get rid of "always new" TestBot false positives?
Francois Gouget
fgouget at codeweavers.com
Tue Mar 31 07:25:41 CDT 2020
On Fri, 27 Mar 2020, Henri Verbeet wrote:
[...]
> If the main goal is to stop the testbot from being ignored, and to
> limit the number of new failures sneaking in, would it make sense to
> start with something fairly blunt, like ignoring failures for tests on
> unreliable configurations? E.g., suppose ddraw:ddraw7 reliably passed
> on w1064v1507, but not w1064v1809, you'd then blacklist all of
> ddraw:ddraw7 on w1064v1809. That means you potentially ignore some
> ddraw:ddraw7 tests that are reliable, but it would still be an
> improvement over effectively ignoring everything.
So that would mean maintaining a set of (test:unit, testbot-vm) tuples
where the TestBot should ignore new failures.
I'm not very fond of the blacklist approach. Once it's in place it may
be very tempting to just put every flaky test into it rather than fixing
it. This will lead to a long list of exceptions which will have to be
maintained. In particular knowing when to remove an entry will be very
important.
I also worry that once the test failures are papered over there won't be
much incentive to fix them. To be fair that risk is not really different
from what could happen with my patch but the scale would be larger.
But it could work with the rare intermittent failures too which would be
valuable. And it could be useful when introducing new test
configurations that have new intermittent / variable issues. So there
could be value in doing this anyway.
Maybe with some safegards it can be made to work.
* I think I'd want a Wine bug describing the issue to be associated with
each blacklist entry. That bug should provide some minimal diagnosis:
whether it's a new Windows behavior, a race condition or some issue
that was reported to QEmu. That would ensure we know why the blacklist
entry was added. One could also check the status of the bug when
reviewing the blacklist entries. A closed bug would be a strong hint
that the blacklist entry is no longer needed.
* And I think it would be better to have a regexp that matches only
the troublesome failures rather than to blacklist the whole test unit.
Besides being finer grained this would be useful for cases like
user32:win which has different issues depending on the locale and
where each should be associated to a different bug (bugs 48815, 48819
and 48820).
* I think I'd also want to record the time when the blacklist entry was
last used. This relies on having the above regular expression since
without it the TestBot would not know anything beyond 'the test unit
was run and had failures'. Also the regular expression would only be
used against *new* failures. So this would really record the last time
the blacklist entry was actually useful.
An entry that was unused for a long time would be a prime candidate
for reviewing the corresponding bug and for removal. (Note: The
blacklist would also be used on WineTest reports so it would get a
chance of matching its target at least 5 days / week).
* I'd want a page listing the blacklisted entries so developers have a
good starting point to work on them.
* Ideally the blacklist page would also point to the tasks where the
blacklist was last used. I think this would also be useful for
developers trying to fix the issues, particularly for the rare
intermittent kind.
Note that Wine VMs often test in multiple configurations per task
(e.g. wow32 and wow64, different locales), each producing its own test
report. So pointing at just the task would leave the developer
guessing which report should be looked at. But that's probably ok.
More importantly, (test:unit, testbot-vm) tuples make it impossible
to blacklist a specific Wine test configuration such as a specific
locale since they all run on the same VM. Similarly it would make
blacklisting bitness-blind on Windows VMs.
If necessary the tuple could maybe be extended with the specific
mission the blacklist applies to. But I'm not sure on the specific
impacts and it may not be worth it.
* Pseudo database schema and sample use:
FailureBlacklists
-----------------
PK Bug 48815
PK TestModule user32
PK TestUnit win
Name 0x738 message
FailureRegExp Test failed: hwnd [0-9A-F]{8,16} message 0738
LastUse 2020-03-27
FailureBlacklistVMs
-------------------
PK Bug 48815
PK TestModule user32
PK TestUnit win
PK VMName Entries for w1064v1709 w1064v1809 etc.
(48815, user32, win, w1064v1709)
(48815, user32, win, w1064v1809)
(48815, user32, win, w1064v1809_2scr)
...
FailureBlacklistUses (optionally)
---------------------------------
PK Bug
PK TestModule
PK TestUnit
PK JobId
PK StepNo
PK TaskNo
(48815, user32, win, 68507, 1, 7)
(48815, user32, win, 68508, 1, 7)
...
--
Francois Gouget <fgouget at codeweavers.com>
More information about the wine-devel
mailing list