New subject: TestBot News - WineTest results analysis

14 Dec 2022

I have been reviewing the TestBot and GitLab CI test results for the 
merged MRs. While doing that I updated the TestBot's known failures list 
(https://testbot.winehq.org/FailuresList.pl) in order to drive down the 
false positive rate.

Incidentally I also collected the list of test units causing false 
positives, so I'll start with that. Specifically, here are the bugs to 
fix to help the GitLab CI:

* Bug 53433 - mmdevapi:capture - impacted 18 MRs
* Bug 54064 - ntdll:threadpool - impacted 15 MRs
* Bug 54078 - ntdll:pipe       - impacted 11 MRs
* Bug 54140 - mmdevapi:render  - impacted 5 MRs
* Bug 54005 - ole32:clipboard  - impacted 5 MRs
* Bug 54037 - user32:msg       - impacted 5 MRs
* Bug 54074 - ws2_32:sock      - impacted 5 MRs

I classified the TestBot / GitLab CI results as follows:

* False positive
  Cases where the CI system incorrectly claimed the MR introduces new 
  failures. This is typically the case when the failures that are 
  already present in nightly WineTest results.

* Bad merge

  MRs that break a test and got merged anyway.

* Collateral Damage from a bad merge

  The false positives (aka collateral damage) caused by one of the bad 
  merges above.

* Outside interference

  This identifies false positives that are not random and intrinsic to 
  the test but that result from change outside the Wine infrastructure, 
  for instance certificates that expire, or configuration changes to 
  servers that break the tests that depend on it.

Of those the only ones that a CI can really avoid are the first type, 
aka "False positive". So I calculated the corresponding weekly rate:

   Adjusted False Positive rate
Week       | TestBot | GitLab CI
2022-11-14 |  21.9%  |    8.3%
2022-11-21 |   8.0%  |   21.6%
2022-11-28 |  14.7%  |   28.4%
2022-12-05 |   8.5%  |   24.5%
2022-12-12 |   0.0%  |   20.0%

Note that the TestBot's 8% rate for the 11-21 week is not representative 
because Wine was broken that week (collateral damage) which prevented 
the tests from running in Wine, and thus from contributing real "false 
positives". Also the 12-12 week is still incomplete obviously.

Even so I think his shows the TestBot is improving.

Here's a list of the incidents for the weeks above:
* 11-14 An external certificate revocation issue caused crypt32:cert to 
  fail systematically. This impacted 14 merge requests and was fixed in 
  MR1360.

* 11-17 MR!1399 got merged despite the TestBot detecting that it 
  prevented 32-bit Wine tests from running to completion. This impacted 
  39 merge requests. I could have reduced that number if I had been 
  faster to reconfigure the TestBot to stop running the full 32-bit Wine 
  test suite. This was fixed in MR!1524.

* 11-17 MR!1398 got merged despite the TestBot detecting that it broke 
  ntoskrnl.exe:ntoskrnl on Windows 7. This was fixed in MR!1803.

* 11-22 MR!1495 got merged despite the TestBot detecting that it broke 
  vbscript:run on Windows *. I don't have a record of the impacted MRs 
  or of when it was fixed.

* 11-23 The b00a831d direct commit broke kernel32:process in Wine. This 
  got fixed since.

* 12-07 MR!1732 got merged despite the TestBot detecting that it broke 
  taskschd:scheduler on Windows *. I immediately added a known failure 
  entry and no MR got impacted. This was fixed in MR!1736.

If not filtering out the failures caused by these incidents, the false 
positive rate is:

     Raw False Positive rate
Week       | TestBot | GitLab CI
2022-11-14 |  52.1%  |   27.1%
2022-11-21 |  50.0%  |   29.5%
2022-11-28 |  20.0%  |   33.7%
2022-12-05 |  19.1%  |   57.4%
2022-12-12 |   0.0%  |   20.0%

I think that also shows that the TestBot is improving.

I have attached the raw data I collected and shell snippets to 
extract various statistics (failures-mr.txt) as well as a spreadsheet 
import (failures.xls).

-- 
Francois Gouget &lt;fgouget(a)codeweavers.com&gt;

TestBot News - MR false positives analysis