[Bug 48164] New: test.winehq.org should provide an efficient way to detect new failures

Sun Nov 24 11:17:31 CST 2019

https://bugs.winehq.org/show_bug.cgi?id=48164

            Bug ID: 48164
           Summary: test.winehq.org should provide an efficient way to
                    detect new failures
           Product: WineHQ.org
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: www-unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

Problem
-------

test.winehq.org does not allow performing the following tasks efficiently:
1. Detecting when a new failure slips past the TestBot.
   One can detect new failures on the per test unit page when specific columns
turn red. But quite often the test unit already has failures so one has to look
at the specific number of failures. Furthermore many test units currently have
failures so this requires checking 80+ pages individually.

2. Detecting when the results on a VM degrade.
   After upgrading a machine it's useful to compare it to its previous results.
   But the results for each date are on separate pages. So again it's necessary
to check the per-test-unit result pages.

3. Comparing the results of two machines of different platforms.
   For instance comparing the results of running Windows 8 to those of
   Windows 10 on the same hardware.

Other things that got asked:
4. Sometimes it would be nice to have only the failures, and not all the lines
with skipped tests and todos.

5. In some cases it would also be nice to have pages with only the failures
that happen on TesBot VMs since these are usually easier to reproduce.

Jeremy's page
-------------

Jeremy's test summary page can help with some of that:
https://www.winehq.org/~jwhite/latest.html

But:
* It's not integrated with test.winehq.org which makes it hard to find.
* There are only two states: Success and Failed: So it does not help when a
test goes from having 2 failures to 4, or when it has a set of systematic
failures and a set of intermittent ones.
* The failed / success pattern is not per VM which masks some patterns and does
not help with point 2.

Proposal
--------

A modified version of Jeremy's page could be integrated with test.wnehq.org:

* It would actually be a pair of 'Failures' pages, one for TestBot VMs and one
for all test results. Both would be linked to from the top of the main index
page, for instance using the same type of 'prev | next' text links used on the
other pages.

* Jeremy's result matrix would be extended from three to four dimensions; test
units, test results, time, and number/type of failures.

* As before the results would be grouped per test unit in alphabetical order.
  Only the test units having at least one error, recent or not, would be shown.
  This could again be in the form of an array ('full report' pages on
test.winehq.org) or simply test unit titles (TestBot jobDetails page style)
with the information about each test unit inside. Clicking on the test unit
name would link to its 'test runs' page on test.winehq.org.

* For each test unit there would be one line per test result having errors. The
first part of the line would have one character per commit for the whole
history available on test.winehq.org. That character would indicate if the test
failed and more.
  The second part of the line would be the test result platform and tag. They
would be sorted per platform and alphabetically.

* Each test result would get a one character code:
   .  Success
   F  Failure
   C  Crash
   T  Timeout
   m  Missing dll (foo=missing or other error code)
   e  Other dll error (foo=load error 1359 and equivalent)
   _  No test (the test did not exist)
  ' ' No result (the machine did not run the tests that day)

* These codes would be shown using a monospace font so they would form a
pattern across time and test results:
   .....F..F...F..F.mmm Win8  vm1
   .....FFFFeFFFFFFeFFF Win8  vm1-ja
   ...TTCC              Win8  vm2-new
   ......eF...F...F..F. Win10 vm3

* Each character would have a tooltip containing details like the meaning of
the letter, the number of failures, or the dll error message.
  They would also link to the corresponding section of the test report.

* In addition to the character the background would be color coded to make
patterns more visible.
   .  Green
   F  Green to yellow to red gradient
   C  Dark red
   T  Purple/pink
   m  Cyan
   e  Dark blue
   _  Light gray
  ' ' White

* The green-yellow-red gradient would be what allows detecting changes in the
number of test failures. That gradient must be consistent for all lines of a
given test unit's pattern.
  Furthermore the gradient must not be computed based on the test result's
number of failures. That is, if a test unit has either 100 or 101 failures,
those must not have nearly indistinguishable colors. Instead the set of all
different failure counts for the test unit should be collected. Zero should be
added to that set. Then these values should be sorted and a color attributed
for each *index*. Then the background color is selected based on the index of
that result's failures count.
  It is expected that each set will be relatively small so that the colors will
be reasonably far apart, making it easy to distinguish a shift from 4 to 6
failures even if there are 100 failures from time to time.
  Also note hat adding zero to the set essentially reserves green for
successful results.

Implementation feasibility
--------------------------

* No changes in dissect.

* In gather, generate a new testunits.txt file containing one line per test
unit:
  - The data source would be the per-report summary.txt files.
    -> These don't indicate when a timeout has occurred so timeouts will appear
as F instead which is acceptable for a first implementation.
  - The first line would contain a star followed by the tags of all the test
runs used to build the file:
  - The other lines would contain the name of the test unit followed by
space-separated pairs of result code/failure count and result tag (including
the platform).
  - A line would be put out even if the test unit had no failure.

  For instance, the commit1 testunit.txt file could contain:
  * win8_vm1 win8_vm1-ja win8_vm2-new win10_vm3
  foo:bar 43 win8_vm1-ja C win8_vm2-new e win10_vm3
  foo:bar2

  - In the example above win8_vm1 only appears on the first line. This means
WineTest was run on that machine but had no failure at all.
  - If the results for commit2 refer to a win8_vm4 machine, we will know that
the reason win8_vm4 does not appear in commit1 file is not because all the
tests succeeded, but because WineTest was not run on win8_vm4 for commit1. This
means that the result code for win8_vm4 for commit1 should be ' '. not '.' for
all test units.
  - If commit2 has results for the foo:bar3 test unit, then we will know the
reason it is not present in the commit1 file is not because all the test runs
were successful, but because foo:bar3 did not exist yet. So its result code
would be '_', not '.'.

* Add a new build-failures script to generate both failures pages.
  - This script will need to read the testunits.txt file for all the commits.
The simplest implementation will be to read all the data into memory before
generating the page. This will avoid having to deal with keeping the current
test unit synchronized between all of the testunits.txt files when a new test
unit has been added.

  - The combined size of the testunits.txt files is expected to be reasonable,
within a factor of 3 of the summary.txt files. For reference, here is some data
about the sizes involved:
    $ du -sh data
    21G     data
    $ ls data/*/*/report | wc -l
    2299
    $ cat data/*/*/report | wc
    34,087,987 231,694,407 2,104,860,095
    $ cat data/*/*/report | egrep '(: Test failed:|: Test succeeded inside todo
block:|done [(]258)|Unhandled exception:)' | wc
    567,158 6,275,504 53,202,999
    $ cat data/*/summary.txt | wc
    186,219 3,046,363 30,596,901

  - Having a function to generate the page will allow calling it twice in a row
to generate both pages without having to load and parse the testunits.txt files
twice.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.