mmdevapi:render test failures
Francois Gouget
fgouget at codeweavers.com
Tue Nov 24 12:46:30 CST 2020
On Mon, 16 Nov 2020, Francois Gouget wrote:
[...]
> So rather than just apply these patches and blindly increase the margin
> factor I dug some more to try to understand what is really going on.
> I'll try to get my theories and findings in shape in my next email.
So my understanding is that on Windows things look like this:
App -> Muxer -> driver -> soundcard
But in QEmu the sound card is in fact something like this:
QEmu -> Spice Server --network--> Spice client -> PulseAudio -> driver -> soundcard
And in particular the Spice server to client communication requires some
buffering to deal with the network jitter. So I suspected that the 'QEMU
soundcard' would initially consume some audio to fill its buffer and
then resume a normal rate. So I tweaked the test so it looked
like this:
Start()
Sleep(25)
last = GetPosition()
Sleep(100)
pos = GetPosition()
compare pos - last to the expected value for 100 ms.
The theory was that QEmu would fill its buffer during the initial 25 ms
sleep and would play at a regular rate during the next 100 ms. That was
indeed a bit better but not really. Increasing the initial sleep to 50
ms improved things a bit more.
So I tweaked the test further to map out the effect of the initial
sleep:
* Increase the initial sleep (called RampUpDelay) from 0 to 200 ms
in 5 ms increments.
* For each run 40 tests.
* Trace the position in milliseconds (Pos0) at the end of the initial
sleep. Also trace the actual time spent sleeping (Slept0).
* Then let the stream play for another 100 ms and measure how long
we actually slept (Slept).
* Get the final position (Pos) and trace the Pos / Slept ratio,
which we expect to be below 1.1 (or 1.4 at the very least).
Then I parsed the log to produce graphs that show the effect of the
RampUpDelay on Pos0 - Slept0 and on the Pos / Slept ratio.
I attached the corresponding patch which should be applied on top of
the ones in the previous emails. You will also find all the data and
graphs derived from these tests in the spreadsheet below:
http://fgouget.free.fr/tmp/mmdevapi-render.xls
And now here is an analysis of the results. Open the image attachements
to follow along.
* Win10 on real hardware : cw-gtx560 + ich6
Fig: render-cw.png
- The first test configuration is what happens with real hardware,
here the cw-gtx560 machine that runs WineTest daily. It has a very
standard ich6 integrated audio soundcard which happens to match the
sound card emulated by QEmu (ich6 or ich9).
- The top graph shows the Pos0-Slept0 values when playing audio
straight to the soundcard, that is when share=0. In that case
Pos0-Slept0 is essentially 0. The way I understand it is that the
Windows API queries the soundcard driver to report the soundcard's
playback position within the soundcard buffer.
- The other thing to notice is that although RampUpDelay is increased
in 5 ms increments the top graph shows that the Slept0 values are
really spread around. That proves Sleep() has at least a 5 ms
jitter. In fact we already know from my past timers email that it's
closer to 15 ms.
- The second graph shows the Pos0-Slept0 values again but when going
through the Windows muxer, i.e. the equivalent to PulseAudio.
- This time Pos0-Slept0 stays close to -10 ms which presumably
corresponds to the increments in which the muxer mixes the audio
from the various sources. And there's a bit more variation (0 to
-20 ms) when RampUpDelay is below 20 ms, most likely due to the
muxer's buffering.
- There are also a few stray points near the -30 ms mark. Maybe these
correspond to scheduling issues (cw-gtx560 was obviously idle but
there may still have been some Windows services doing their thing,
although most are normally disabled).
- Also there is less spread of the Slept0 values in the share=1 case:
instead of a continous blur one sees short almost vertical lines.
That contradicts the previous assertion about Sleep() jitter. Maybe
the difference is that the Windows muxer changed the global Windows
timer resolution. See timeBeginPeriod() and NtSetTimerResolution()
and the previous timer dicussion:
https://www.winehq.org/pipermail/wine-devel/2020-March/162846.html
On a related note, see this blog post which implies the timer
management has changed a lot in Windows 10 2004. All QEmu tests were
done with Windows 10 2009 (aka 20H2) but the cw-gtx560 tests were
run in Windows 10 <= 1909 so a global timer change in the muxer
could have impacted the mmdevapi_test process.
https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/
- The last graph shows that in the share=1 case the ratio stays very
close to 1.0, except when RampUpDelay is less than 20 ms (i.e. the
current test configuration) where it can be as low as 0.9 and as
high as 1.1. These are probably the effect of the Windows muxer's
initial buffering.
- In the share=0 case the ratio stays very close to 1.0 all along so I
did not attach the corresponding graph.
* Win10 on QEmu 5.0 + ich9 + localhost Spice client
Fig: render-spice-local.png
- The test gives 500 ms worth of data to mmdevapi. But what we see in
the top graph is that mmdevapi does not instantly slurps 40 ms worth
of data. Rather Pos0-Slept0, i.e the amount that is buffered,
increases progressively, which means the audio is being pulled at a
somewhat higher than expected playrate.
- Then after about 100 ms Pos0-Slept0 stabilizes between 25 and 45 ms.
So in effect once the QEmu 'soundcard' has filled its buffer the
playrate goes back to normal.
- Note how the above actually matches the 1.4 ratio we have had to use
in the mmdevapi:render test.
- The graph in the middle is the same as the top one but indexed on
Slept0, i.e. the real sleep wall time, rather than RampUpDelay, i.e.
the parameter passed to Sleep(). It is at once similar and yet
different from the graph above. In particular one can clearly see
that initially Pos0-Slept0 is quantized. That probably has something
to do with the audio being sent to the Spice client using the
maximum size allowed by the Spice protocol initially. Then once the
buffering target is reached, only enough data to refill the ~40 ms
buffer is sent and that is more dependent on the actual time
intervals and scheduling of the Spice server and client; resulting
in a more random distribution of points.
- The graph at the bottom shows the ratio. What it shows is that
initially the playrate is 40% higher than it should be, which
corresponds to the 40 ms of extra buffering. Then that falls down
progressively as more and more of our measurement interval is
performed when the 40 ms buffer is already full.
- Once the 40 ms buffer is full, i.e. after the initial 100 ms, the
ratio is very consistently between 1.0 and 1.1.
- The graphs when share=1 are very similar so I did not include them.
The notable differences are that Pos0-Slept0 stays in a narrower
band: 25 to 40 ms. But it takes an extra 20 ms for the Pos/Slept
ratio to fall safely within the 1.0 to 1.1 range.
* Win10 on QEmu 3.1 + ich9 + remote Spice client
- If the buffering is meant to cover the network jitter, then more
buffering would be needed for a remote client with a higher RTT, and
presumably proportionally higher network jitter.
- So I repeated the test over a transatlantic connection with a 104 ms
RTT with little jitter (rtt min/max/mdev = 103.934/104.858/0.253 ms).
- The Pos0-Slept0 graph did show a slightly increases amount of
buffering in the share0 case but not by much. The Windows muxer
still seems to insulate the tests from the buffering jitter and that
graph did not change.
- Because there's not much change I did not include these graphs.
* Win10 on QEmu 5.0 + ich9 + no Spice client
Fig: render-spice-noclient.png
- No client is connected to the Testbot VMs when they run Wine tests
(except when the TestBot administrator tries to debug them). So it's
interesting to test what happens when there is no Spice client. Also
in that case there is no need to do any buffering to cover the
network jitter or anything.
- The top graph shows Pos0-Slept0 in the share=0 case. Yikes!!!
After buffering the initial 40 ms Pos0-Slept0 takes a nose dive: the
'soundcard' just stalls. It seems like it's waiting for the Spice
client drain the buffer. But there is no Spice client.
- Predictably the Pos/Slept ratio falls to 0 in the share=0 case so I
did not include this graph.
- The next graph shows what's happening with Windows' muxer. It's
essentially the same graph except that we can see when the muxer
gets scheduled and that the slope is not as steep, probably because
the muxer drops some data when its buffer is full.
- In the share=1 case the Pos/Slept ratio stabilizes around 0.3 thanks
to the Windows muxer dropping data.
- Notice how when there is no client the ratio can never be above 1.4.
That means ok(pos*1000/freq <= slept*1.4,...) can never fail. And
yet it does! See the next section for the explanation.
- Since the soundcard stalls all the tests that expect Pos to continue
increasing will fail. And indeed that's what we see with w10pro64:
render.c:1310: Test failed: No position increase at iteration 1
- This is a QEmu 5.0 bug but I have not yet figured out if it has
already been reported.
* Win10 on QEmu 3.1 + ich9 + no Spice client
- The QEmu 5.0 results where very perplexing... until I realized that
most of the TestBot VMs run in QEmu 3.1. So I redid the tests with a
QEmu 3.1 Windows 10 VM.
- If you want to see the graphs just open spice-local.png. They're
again very similar.
- And that's why we get the '1.4 ratio' failures on most TestBot VMs.
But at least we avoid the 'No position increase' ones.
- The VM hosts running QEmu 3.1 have the same libspice-server1 package
(0.14.0-1.3+deb10u1) as the ones running QEmu 5.0. And the Spice
client was the same in both cases. That's why I think the bug is in
QEmu 5.0 rather than in Spice (unless it's a QEmu change that
revealed a Spice bug).
* Win10 on QEmu 5.0 + ich9 + VNC
- So then I tested what happens if the QEmu remote access method is
switched to VNC.
- Again I did not include the graphs because they are very
similar to the spice-local.png ones, whether a VNC client is
connected or not (but you can find them in the spreadsheet). The
timings are a bit looser: there's a 51 ms Pos0-Slept0 datapoint in
the no-client case but the ratio curve still remains between 1.0 and
1.1 after the initial 100 ms.
- When a VNC client is connected the spread is a bit higher with more
Pos0-Slept0 datapoints close to the 50 ms mark and some stray ratios
outside the 1.0 to 1.1 range below the 110 ms ramp up delay. So
there's some difference between the VNC and Spice audio streaming
algorithms (and maybe a bit more buffering). But that would not be
an issue for the TestBot since there's no client normally.
* WinePulse cf49617 + Debian 10 on QEmu 5.0 + ich6 + localhost Spice
Fig: render-spice-wine.png
- Finally I looked at how the test behaves in Wine with PulseAudio.
Unfortunately the QEmu emulated hardware configuration is a bit
different (440fx+ich6 instead of q35+ich9) but I doubt it has an
impact.
- First note that with PulseAudio there is no direct access to
the soundcard so we only get results for share=1.
- The most striking thing about the Pos0-Slept0 graph is that, besides
a few exceptions, it has only 4 values: -5, -6, -10 or -11 ms!
I don't know if this should be attributed to Winepulse, PulseAudio
and/or Linux's much better scheduling.
- In any case there is absolutely no trace of the QEmu soundcard
buffering. So something is insulating the test from it. Again I'm
not sure if that's Winepulse, PulseAudio or the kernel itself.
- This graph also confirms that Sleep() has much less jitter than on
Windows, even when Windows runs on real hardware. Slept0 is always
within 1ms of the Sleep() value. So an OS running in QEmu can have
pretty tight timings, at least more than 9% of the time.
- The ratio graph at the bottom is similar in that it has only three
values: 0.9, 1.0 and 1.1. Given that the timings are much tighter
than on Windows it's a bit strange that the ratio interval is not as
tight as on Windows. But it is still well within the range accepted
by this test.
- Not surprisingly the mmdevapi:render failures we used to get in
Wine were not timing related.
--
Francois Gouget <fgouget at codeweavers.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-mmdevapi-render-Explore-the-stream-ramp-up-duration-.patch
Type: text/x-diff
Size: 3228 bytes
Desc: 0004-mmdevapi-render-Explore-the-stream-ramp-up-duration-.patch
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20201124/0ef9670e/attachment-0001.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: render-cw.png
Type: image/png
Size: 67217 bytes
Desc: render-cw.png
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20201124/0ef9670e/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: render-spice-local.png
Type: image/png
Size: 96780 bytes
Desc: render-spice-local.png
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20201124/0ef9670e/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: render-spice-noclient.png
Type: image/png
Size: 73821 bytes
Desc: render-spice-noclient.png
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20201124/0ef9670e/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: render-spice-wine.png
Type: image/png
Size: 43985 bytes
Desc: render-spice-wine.png
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20201124/0ef9670e/attachment-0007.png>
More information about the wine-devel
mailing list