I have been reviewing the TestBot and GitLab CI test results for the
merged MRs. While doing that I updated the TestBot's known failures list
(
https://testbot.winehq.org/FailuresList.pl) in order to drive down the
false positive rate.
Incidentally I also collected the list of test units causing false
positives, so I'll start with that. Specifically, here are the bugs to
fix to help the GitLab CI:
* Bug 53433 - mmdevapi:capture - impacted 18 MRs
* Bug 54064 - ntdll:threadpool - impacted 15 MRs
* Bug 54078 - ntdll:pipe - impacted 11 MRs
* Bug 54140 - mmdevapi:render - impacted 5 MRs
* Bug 54005 - ole32:clipboard - impacted 5 MRs
* Bug 54037 - user32:msg - impacted 5 MRs
* Bug 54074 - ws2_32:sock - impacted 5 MRs
I classified the TestBot / GitLab CI results as follows:
* False positive
Cases where the CI system incorrectly claimed the MR introduces new
failures. This is typically the case when the failures that are
already present in nightly WineTest results.
* Bad merge
MRs that break a test and got merged anyway.
* Collateral Damage from a bad merge
The false positives (aka collateral damage) caused by one of the bad
merges above.
* Outside interference
This identifies false positives that are not random and intrinsic to
the test but that result from change outside the Wine infrastructure,
for instance certificates that expire, or configuration changes to
servers that break the tests that depend on it.
Of those the only ones that a CI can really avoid are the first type,
aka "False positive". So I calculated the corresponding weekly rate:
Adjusted False Positive rate
Week | TestBot | GitLab CI
2022-11-14 | 21.9% | 8.3%
2022-11-21 | 8.0% | 21.6%
2022-11-28 | 14.7% | 28.4%
2022-12-05 | 8.5% | 24.5%
2022-12-12 | 0.0% | 20.0%
Note that the TestBot's 8% rate for the 11-21 week is not representative
because Wine was broken that week (collateral damage) which prevented
the tests from running in Wine, and thus from contributing real "false
positives". Also the 12-12 week is still incomplete obviously.
Even so I think his shows the TestBot is improving.
Here's a list of the incidents for the weeks above:
* 11-14 An external certificate revocation issue caused crypt32:cert to
fail systematically. This impacted 14 merge requests and was fixed in
MR1360.
* 11-17 MR!1399 got merged despite the TestBot detecting that it
prevented 32-bit Wine tests from running to completion. This impacted
39 merge requests. I could have reduced that number if I had been
faster to reconfigure the TestBot to stop running the full 32-bit Wine
test suite. This was fixed in MR!1524.
* 11-17 MR!1398 got merged despite the TestBot detecting that it broke
ntoskrnl.exe:ntoskrnl on Windows 7. This was fixed in MR!1803.
* 11-22 MR!1495 got merged despite the TestBot detecting that it broke
vbscript:run on Windows *. I don't have a record of the impacted MRs
or of when it was fixed.
* 11-23 The b00a831d direct commit broke kernel32:process in Wine. This
got fixed since.
* 12-07 MR!1732 got merged despite the TestBot detecting that it broke
taskschd:scheduler on Windows *. I immediately added a known failure
entry and no MR got impacted. This was fixed in MR!1736.
If not filtering out the failures caused by these incidents, the false
positive rate is:
Raw False Positive rate
Week | TestBot | GitLab CI
2022-11-14 | 52.1% | 27.1%
2022-11-21 | 50.0% | 29.5%
2022-11-28 | 20.0% | 33.7%
2022-12-05 | 19.1% | 57.4%
2022-12-12 | 0.0% | 20.0%
I think that also shows that the TestBot is improving.
I have attached the raw data I collected and shell snippets to
extract various statistics (failures-mr.txt) as well as a spreadsheet
import (failures.xls).
--
Francois Gouget <fgouget(a)codeweavers.com>