mmdevapi: playing silence is introducing delays

Thu Aug 4 03:17:19 CDT 2011

Hi,

After much discussion, the ALSA devs identified 2 fundamental API needs
a) How much data can I pump into audio NOW?
   That's what snd_pcm_avail_update is for.
b) When sync'ing audio and video, what frame is being heard NOW?
   That's what snd_pcm_delay is (or is intended to be) about.
http://mailman.alsa-project.org/pipermail/alsa-devel/2008-June/thread.html

PlaySound only cares about a) - pump data and be done.  In the
mmdevapi world, after much thinking (really :) I concluded that:
a) translates to GetCurrentPadding
b) translates to IAudioClock::GetPosition "the stream position of the
   sample that is currently playing through the speakers", says MSDN.
I can't foresee whether the introduction of IAudioClock2::GetDevicePosition
in w7 means that the other was badly designed (or impractical) and will be obsoleted.
winealsa.drv/mmdevdrv does not yet use snd_pcm_delay and that is IMHO
a sign of a bug.

Now consider the case of a remote desktop connection with bad latency.
An app may pump 2 second worth of samples until sound is heard (you
wouldn't like to use telnet there).
Such a scenario was not considered at the time the WINMM API was conceived.
Back then, a DA converter connected to speakers would scan a ring
buffer.  Apps could write directly into that buffer (DSound) or have
the OS copy data to it (WINMM) -- you all know that model.

So what does WaveOutPosition return?
Back to the future, I don't know when to return a WAVHDR from winmm:
  1) Should it wait until the sample went to the speakers, risking
    underruns because the app was not prepared for 2s network latency? Or
  2) Should it pump as fast as possible, returning buffers as soon as
    they are not needed anymore, e.g. after snd_pcm_write? Or
  3) Select an intermediate way and rate-limit the speed at which
    buffers are returned to the app in an attempt to simulate an
    old-style low latency HW ring buffer?

That concern is very real.  For instance, even without remote desktop,
PulseAudio has a huge 2 seconds buffer that will eat that much data
when starting up.  In case 2), the poor winmm app may not be prepared
for being immediately returned the 3 WAVEHDR it uses, totaling 1s
worth of samples and believe in an underrun or totally loose sync.
You don't believe that might happen?  That's happening exactly now.

Mmdevapi is not winmm, however my tests show that native mmdevapi does
not let you pump more than GetBufferSize samples ahead of time,
because the mixer slowly eats chunks of 10ms worth of samples.
Winealsa eats GetBufferSize + whatever ALSA uses, e.g. 2s with PulseAudio.

I favour 3) for winmm, but it's not yet clear to me where the rate-limitation
should happen: mmdevapi and/or winmm.

Regards,
 Jörg Höhle