[Bug 48912] New: Allow blacklisting unreliable and always new failures

Sat Apr 11 04:47:10 CDT 2020

https://bugs.winehq.org/show_bug.cgi?id=48912

            Bug ID: 48912
           Summary: Allow blacklisting unreliable and always new failures
           Product: Wine-Testbot
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

A number of tests produce false positives because:
* Either the failure message contains a value that changes with each run (e.g.
an HWND). In some cases these values are really necessary for diagnosis and
moving them to a separate line would make the code quite a bit more complex.
* Or the test fails intermittently and is hard to fix. Sometimes the cause of
the failures may also come from external factors such as bugs in QEmu (for
instance long execution delays causing timeouts).
* Sometimes the intermittent failure is the product of a new Windows version or
configuration, which requires investigating before a fix is found.

In all cases the tests should be fixed eventually, but a solution is needed for
the interim so these tests to not make the TestBot so unreliable that its
results are ignored.

So the goal is to provide a way to blacklist some test failures so they do not
cause a patch to be rejected. Some safeguards are needed to ensure that:
* Failures are not blacklisted lightly. The preference should always be to fix
the test.
* The blacklist does not get so large that it becomes hard to maintain.
* Once a test is fixed the corresponding blacklist entries are removed in a
timely fashion.
* There is still an incentive to fix the tests.

So here is a proposal for implementing this blacklist:

* Each blacklist entry should be associated with a Wine bug describing the
issue. That bug should provide some minimal diagnosis: whether it's a new
Windows behavior, a race condition or some issue that was reported to QEmu.
That would ensure we know why the blacklist entry was added. One could also
check the status of the bug when reviewing the blacklist entries. A closed bug
would be a strong hint that the blacklist entry is no longer needed.

* Rather than blacklisting whole test units, blacklist entries should target
specific test failures via a regular expression. Besides being finer grained
this would be useful for cases like  user32:win which has different issues
depending on the locale and where each should be linked to a different bug
(bugs 48815, 48819 and 48820).

* The TestBot should record when each blacklist entry was last used. This
relies on having the above regular expression since without it the TestBot
would not know anything beyond 'the test unit was run and had failures'. Also
the regular expression would only be used against *new* failures. So this would
really record the last time the blacklist entry was actually useful.

An entry that was unused for a long time would be a prime candidate for
reviewing the corresponding bug and for removal. (Note: The blacklist would
also be used on WineTest reports so it would get a chance of matching its
target at least 5 days / week).

* The blacklist entries would only be needed for 'base' test configurations
since they are the only ones wine-devel patches run on.

* There should be a page listing the blacklist entries so developers have a
good starting point to work on them.

* Ideally the blacklist page would also point to the tasks where the blacklist
was last used. Since most of the blacklisted failures are either intermittent
or specific to a given test configuration this would make it easier for
developers to find reports where the failure did happen.

  Note that Wine VMs often test multiple configurations per task (e.g. wow32
and wow64, different locales), each producing its own test report. So pointing
at just the task would leave the developer guessing which report should be
looked at. But that may be sufficient guidance.

* (test:unit, testbot-vm) tuples make it impossible to target a specific Wine
test configuration such as a specific locale since they all run on the same VM.
Similarly it would make blacklisting bitness-blind on Windows VMs. If necessary
the tuple could be extended either with the specific mission the blacklist
applies to, or with the basename of the corresponding report (the latter being
easier to use in comparisons). Whether that's practical and worth the effort
remains to be determined. One complication for instance is that this would lead
to more blacklist entries: many 64 bit VMs would need two entries, one for 32
bit tests and one for 64 bit tests.

* Pseudo database schema and sample use:
  FailureBlacklists
  -----------------

  PK Bug             48815
  PK TestModule      user32
  PK TestUnit        win
     Name            0x738 message
     FailureRegExp   Test failed: hwnd [0-9A-F]{8,16} message 0738
     LastUse         2020-03-27

  FailureBlacklistVMs
  -------------------

  PK Bug             48815
  PK TestModule      user32
  PK TestUnit        win
  PK VMName          Entries for w1064v1709, w1064v1809, etc.

  (48815, user32, win, w1064v1709)
  (48815, user32, win, w1064v1809)
  (48815, user32, win, w1064v1809_2scr)
  ...

  FailureBlacklistUses (optionally)
  ---------------------------------

  PK Bug
  PK TestModule
  PK TestUnit
  PK JobId
  PK StepNo
  PK TaskNo

  (48815, user32, win, 68507, 1, 7)
  (48815, user32, win, 68508, 1, 7)
  ...

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.