[PATCH variant 1] dsound: use a low-quality FIR for games

Wed May 23 12:08:45 CDT 2012

2012/5/22 Andrew Eikum <aeikum at codeweavers.com>:
> Thanks Alexander. Thoughts below...
>
> On Sat, May 19, 2012 at 09:09:35PM +0600, Alexander E. Patrakov wrote:
>> There are two ways to implement a high-performance resampler, and I
>> have prepared (conflicting, pick no more than one) patches for both:
>>
>> 1 (this patch): Use a shorter FIR with the existing code. This has the
>> advantage of higher quality (unwanted frequencies are at least
>> attempted to be rejected) and almost no new code.
>> 2 (the other patch): Write new code. E.g., linear interpolation. This
>> is what Windows XP does at its lowest quality setting, and it eats
>> less CPU than variant 1.
>>
>
> Do you have an opinion on which of these patches to use? The
> low-quality FIR has the advantage of not introducing another codepath.
> On the other hand, the linear resampler codepath is very simple, and
> even easier on the CPU.

Yes. And Windows has two code paths as well. OTOH the linear resampler
has lower quality, and different latency from the FIR-based filter.
Due to this difference of latency there may be unaviodable clicks in
games (sorry, no concrete example) that frequently switch from 3 to 4
buffers and back. The FIR-based approach eliminates this effect,
because there is no latency difference (or, because this is untested,
better say: it's a bug in my code if there is any latency difference).

> I'm leaning towards the linear resampler for its larger CPU usage
> benefits.

I have no real preference. If there are no other arguments (i.e. if
clicks due to switching resamplers are not a valid/worthy argument),
let's use the linear resampler, because I wrote it first and because
GyB tested it. Anyway, it doesn't really matter, because it is
possible to change this later, or even implement a 3-level quality
degradation strategy (long FIR -> short FIR -> linear interpolation).

As for your performance analysis - yes, get_current_sample() is cheap,
and the main cost is due to caching the FIR and calculating the
convolution. As far as I understand (but I can be wrong here), it
would be fair enough to count only the "sum += fir_copy[j] *
cache[j];" line. Still, I don't think it explains the whole picture.

Let's say that the FIR length is X samples of the lowest of the two
frequencies (X is a constant for a given FIR, and for my FIR it is
66). So, if upsampling, each output sample is affected by X input
samples, and by X * freqAdjust input samples when downsampling. Since
both GTA:SA and Darwinia use 32 buffers, we can consider only a single
buffer and count the number of passes through the "sum" line per
second.

Darwinia downsamples, has freqAdjust ~ 2..4. Thus, it executes the
"sum" line 2..4 * X times per output sample, i.e. 40000..90000 * X
times per second per input buffer.

GTA:SA upsamples, and thus executes the "sum" line X times per output
sample, i.e. 48000 * X times per second per input buffer.

The ratio seems to be consistent with the number of convolutions per
buffer per time step that you report, because the number of sampes per
time step is different for these two games.

So I am not convinced with your analysis - but this is based on the
assumption that only the "sum += fir_copy[j] * cache[j];" line really
matters and that filling in fir_cache eats time proportional to the
summing (it has to go through the same number of iterations).

-- 
Alexander E. Patrakov