[PATCH] kernel32/tests: Simplify the name of the test unit for child processes.

Francois Gouget fgouget at codeweavers.com
Sat Feb 22 06:57:16 CST 2020


So I managed to reproduce this failure:
https://testbot.winehq.org/JobDetails.pl?Key=65628

First ignore the w7pro64 failures: it seems one cannot run the job tests 
twice in a row on Windows 7, presumably because some missing cleanup 
isses. So these are not interesting here.

Also I get 3 failures out of 9 Windows 10 runs, where each goes through 
the job tests 10 times. So that's a failure rate of 3.3%. That seems 
inconsistent with the test.winehq.org results: it has between 40 times 
10 (newtb*) and 40 times 16 (newtb*+cw*) WineTest runs and none shows 
this failure, putting the failure rate below 0.25%.

This job is also not sufficient to prove that the issue is specific to 
Windows 10: it does not have enough Windows 8 runs to get a 
statistically significant result. test.winehq.org has between 40*2 
(newtb*) and 40*5 (newtb*+cw*) WineTest runs which still seems 
inconsistent with a 3% failure rate.

So although there's not really definitive proof, it looks like this may 
be specific to standalone kernel32:process runs on Windows 10.


Timeout
-------

The timeouts happened on the 1st or 2nd round while waiting for
'kernel32_test.exe process exit':
* 1x WaitForSingleObject(1s) of test_jobInheritance()
* 2x WaitForSingleObject(1s) of test_QueryInformationJobObject()

'process exit' has a 0.1s sleep which may not be strictly necessary.
Still there is no clear reason for it to not complete within the 
imparted 1s.

So I suspect something delayed 'process exit' either within the VM or 
outside it.

* An out-of-VM troublemaker should be decorrelated from the in-VM 
  activity and thus hit WineTest and any kernel32:process job tests pass 
  with equal frequency... except if it's something related to the VM 
  revert / startup (e.g. SSD garbage collecting after the revert I/O 
  peak).

* Normally all in-VM troublemakers such as Windows Update, Defender, 
  Search are disabled [1]. Maybe there's still something running shortly 
  after the VM's clock gets changed that causes trouble.

Options:
* The simplest would be to increase the timeout to 2s for instance? This 
  should have essentially no impact on run time since we should not hit 
  the timeout in most cases.

* Automatically rerun any wine-dev task that has failures, hoping that 
  it will not fail on the second run.

  This would not be specific to this kernel32:process issue. 
  The drawbacks are that:
  - This risks letting in any test that fails less than 50% of the time.
  - This would delay emails notifying that a patch causes new failures. 
  - This would increase the TestBot load somewhat but that would likely 
    be manageable.

  I would also argue that this is not really necessary:
  - Now that intermittently failing tests are properly accounted for 
    this case should be quite rare.
  - When this happens one can simply analyze the test (and fix it?) and
    rerun the patch manually or resubmit it through wine-devel to prove 
    the failure was an unrelated fluke.


Conversely some timeouts are pretty high: 30s and 60s. Presumably we 
hope they will never happen.
There is also a 1s Sleep(1000) which I don't really see the relevance of 
(in test_SuspendFlag()).


Trace mangling
--------------

Two of the timeouts happened in the absence of trace mangling. So the 
two issues seem unrelated.

The processes involved in the trace mangling are:

* w1064v1709 x3+, w1064v1607 x2, w1064v1809_fr x1
  the last 'process exit' started by test_WaitForJobObject()
  polluting the 'not waited for' parent trace
  -> This proves test_WaitForJobObject() is buggy.

* w1064v1709 x1
  'process exit' started by test_jobInheritance()
  polluting the WaitForSingleObject() timeout failure for that same 
  process in the parent.
  -> The only way to avoid that is to avoid the timeout?



[1] Here are the full notes for the base 1507 snapshot:

    Windows 10 1507 64-bit Home Edition.
    Uses a SCSI disk (0.1.164), e1000 network card, ich6 sound card, VGA 
    graphics card.
    Disabled the screensaver, disk and computer suspend, Windows update, 
    Windows defender, Windows search, restoration points, 
    defragmentation, telemetry and the CEIP, hibernation, swap, time of 
    last access (fsutil behavior set disablelastaccess 1).
    Added optional DirectX components.
    Autologin and TestAgent 1.7 autostart.

    [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU]
    "NoAutoUpdate"=DWORD:1
    "AUOptions"=DWORD:2


-- 
Francois Gouget <fgouget at codeweavers.com>



More information about the wine-devel mailing list