TestBot: A dive into the Windows timers
Francois Gouget
fgouget at codeweavers.com
Wed Mar 25 11:53:59 CDT 2020
On Tue, 24 Mar 2020, Zebediah Figura wrote:
[...]
> > * This means that based on just a few events one cannot expect the
> > interval between most events to fall within a narrow range. So here
> > for instance if the acceptable interval is 190-210 ms and the first
> > interval is instead 237 ms, then the next one will necessarily be out
> > of range too, and likely the one after that too. So expecting 2 out
> > of 3 intervals to be within the range is no more reliable than
> > checking just one interval.
>
> Allowing for more error than 10ms seems reasonable to me, even by an
> order of magnitude.
The test tolerances are not that tight, as far as I know, and certainly
not for this threadpool timer test. That was just me testing an
alternative approach and finding it to not be viable. As I said in this
specific case the allowed range is 500-750 for an expected 600 ms (3*200
ms).
But there are cases in other tests where we do a TerminateProcess() or
similar and expect the WaitForSingleObject() to return within 100 ms. I
don't think those are correct. Even 1s feels short. The recent
kernel32:process helper functions replaced a bunch of them with
wait_child_process() calls so now the timeout is 30s. I may align the
remaining timeouts with that... though I feel 30s is a bit large. Surely
10s should be enough?
[...]
> > * In QEmu, when the timer misses it often misses big: 437 ms, 687 ms,
> > even 1469 ms. So most of the time expecting three events to take about
> > 3 intervals does not help with reliability because the timer does not
> > try to compensate the missed events. So at the end it will still be
> > off by one interval (200 ms) or more.
> >
> > * I could not reproduce these big misses on the Windows 8.1 on
> > cw-rx460 machine (i.e. real hardware).
>
> This is the real problem, I guess. I mean, the operating system makes no
> guarantees about timers firing on time, of course, but when we try to
> wait for events to happen and they're frequently late by over a second,
> that makes things very difficult to test.
>
> Is it possible the CPU is under heavy load?
Not really, no. There's really not much running on the VM hosts:
* VMs
We run at most one VM at a time per host, precisely to make sure the
activity in one VM does not interfere with the tests running in the
other VM(s). Of course it make the TestBot pretty inefficient and it
also does not prevent these delays :-(
* Unattended upgrades
Once a day apt will check for security updates and install them. But
on Debian stable that should not amount to much.
* Acts of administrator
Mostly VM backup/restore, debugging, reconfiguring. But these are too
infrequent to explain all the delays we get.
Also I'm not convinced CPU load on the host is the cause of these
delays.
--
Francois Gouget <fgouget at codeweavers.com>
More information about the wine-devel
mailing list