In parallel with the MR test results I analyzed the "new" test failures
in each nightly WineTest run and updated the TestBot known failures
accordingly.
A "new" failure in the WineTest results is equivalent to a "false
positive" in the MR tests: a lot of them are not new but rare failures
or ones where the messages changes in every run due to the presence of
random values (e.g. pointers, handles).
Analyzing the WineTest results is particularly important for the Windows
VMs:
* The TestBot runs the full 32-bit test suite in Wine for every MR which
ensures a good coverage of the tests and a lot of opportunity to
detect flaky tests.
* But on Windows the TestBot only runs the tests directly impacted
by the MR which provides less opportunity for detecting which tests
are flaky on Windows.
So analyzing the nightly Windows WineTest results provides more
opportunity for detecting which tests are unreliable on Windows before
they cause trouble for MRs.
The results show improvements: they are now regularly below 10 new
failures whereas before 11-29 there was over 20 in each run (and almost
20 when deduplicated, see the attached spreadsheet). There's quite a bit
of noise so we'll know more in the coming weeks.
Also the Windows 11 test configurations don't have too many new failures
anymore so it should be possible to test the MRs against them soon.
I attached the raw data in failures-winetest.txt and an updated
spreadsheet (failures.xls).
--
Francois Gouget <fgouget(a)codeweavers.com>