[Bug 48166] New: test.winehq.org Provide a way to track individual failures

Sun Nov 24 12:29:12 CST 2019

https://bugs.winehq.org/show_bug.cgi?id=48166

            Bug ID: 48166
           Summary: test.winehq.org Provide a way to track individual
                    failures
           Product: WineHQ.org
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: www-unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

A test unit may have multiple unrelated test failures. Some may fail on recent
Windows 10 machines while others may only happen in certain locales or some
graphics cards. Untangling these can actually be automated.

The root is to merge the failures of two reports together while associating
each failure with the tag or the report(s) it originates from. To do so diff
the two error lists.
* Failures that are present in both get both tags.
* Failures that are only present in the first report get only that tag.
* Same thing for failures that are present only in the second report.
And all failures are integrated in the merged list in the order they are
returned in by Algorithm::Diff.

Once two failure lists have been merged together, it's possible to continue
merging more failure lists. This allows getting a unified list of the failures
of a given commitid, and appending that commitid to the tags allows building a
complete list of all the available history.

Then one can group all failures that have the exact same set of tag+commitids
combinations together. Then, since the failures in different groups don't all
happen together they must depend on different factors.

Intermittent failures, timeouts, crashes
----------------------------------------

If two failures are related but a random timeout or crash sometimes occurs
between them they might end up being incorrectly split in two separate groups.

So if a crash or a timeout occurs, any other failure in that run should be
ignored when grouping failures together.

This can be achieved by prefixing the tag with a '*' if a crash or timeout
occured. Then, when grouping failures together, ignore the entries where the
tag starts with a '*'. But when building the occurence pattern, do use the
entries starting with a '*' to show all the test runs where at least one of the
failures in the group occurred.

Because entries starting with a '*' are ignored when building failure groups,
failures that cause a crash/timeout and the crash/timeout line itself will be
part of no failure group. So do a second pass over the unassigned failures,
this time not ignoring the entries with a '*'. This will create groups composed
of the crash/timeout and any related failures.

New failures
------------

This analysis indicates where and when a given failure group happened. This
means it can also detect new failures.

It would not be useful to define a new failure as one that never happened
before the latest commit: blink and you might miss it. Instead it should be
expanded to all failures that only happened in the first half of the available
history. This may sometimes falsely treat rare intermittent failures as new but
that should be rare enough.

Presentation of the results
---------------------------

The results can be presented on a page with one box per test unit like the
'Full Report' pages. A 'details' link under the test unit name on the test
failrues pattern page would link to the relevant section of the full page.

Inside each box there would be a sequence of lines showing the failures in a
group, followed by the usual pattern showing which machines the failure happens
on; then the next failure group, etc. For instance:

console.c:270: Test failed: got 16, expected 6
console.c:275: Test failed: got 16, expected 6
   .....F..F...F..F.mmm Win8  vm1
   ......FF.e...FF.e..F Win8  vm1-ja

096c:console: unhandled exception c0000005 at 6F384E33
   .....CC              Win8  vm2-new

As usual the items in the pattern would link to the relevant pages, allowing to
dig deeper into the issue. The same color coding would be used for the pattern
but since failure groups always have the same number of failures only one color
would be used for the F code.

The failure line number will change from one run to the next so zero it out or
only retain one value at random. Similarly, if the failure contains timing
information (see bug 48094), remove the timing information.

If a failure is identified as new, put its lines in bold orange, like on the
TestBot. This will allow quick identification of the new failures on the page.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.