[PATCH 2/6] wined3d: Introduce WINED3DUSAGE_MANAGED.

Matteo Bruni matteo.mystral at gmail.com
Tue Nov 13 13:41:05 CST 2018


Sorry for the long reply, TLDR: behavior with D3DPOOL_SYSTEMMEM and
D3DPOOL_MANAGED buffers is not uniform among vendors, except for
DISCARD which is always ignored. Mapping D3DPOOL_SYSTEMMEM textures
never blocks.

Read further below for the gritty details...

On Sat, Oct 27, 2018 at 3:44 PM Stefan Dösinger
<stefandoesinger at gmail.com> wrote:
> Am 27.10.2018 um 14:48 schrieb Henri Verbeet <hverbeet at gmail.com>:
>
> As for the point you raise about map synchronisation, an implication
> of the above would be that mapping SYSTEMMEMORY buffers never blocks,
> beyond perhaps the draw-time upload.

That checks out, but interestingly only on Nvidia.
I hacked a pair of QueryPerformanceCounter() calls around the second
buffer map in the loop in test_map_synchronization() and augmented the
test to also try with D3DPOOL_SYSTEMMEM buffers. On Nvidia, the whole
Lock() / Unlock() dance is virtually instant (~2 μs according to
QueryPerformanceCounter(), which is clearly at the limits of its
resolution but nevertheless seems to give decently consistent and
usable results) for D3DPOOL_SYSTEMMEM, regardless of the map flags.
D3DPOOL_DEFAULT buffers behave as you would expect, i.e. mapping the
buffer without flags right after a "large" draw blocks (it takes ~100
ms for me), NOOVERWRITE map is almost as fast as the D3DPOOL_SYSTEMMEM
case (~2.5 μs), DISCARD takes just slightly longer (~20 μs). Ah,
updating the D3DPOOL_SYSTEMMEM buffer with NOOVERWRITE (or otherwise)
won't update the data in use by the draw, so the map is "synchronized"
as far as the test is concerned.
I also tested D3DPOOL_MANAGED and their results probably make sense
too, although they aren't entirely what I expected. The no flags map
case takes 1 ms for me, while the others usually take around 160 μs
(although I have seen those sporadically take ~500 μs too). The 0
flags case in particular takes way longer than the SYSTEMMEM case but
still 2 orders of magnitude less than the D3DPOOL_DEFAULT case. I
guess one possible way to explain it is that, for managed buffers, the
driver needs to copy the buffer back to system memory but doesn't need
to wait for the draw to complete (at least on the GPU, I guess it
might need to complete "dispatching" the draw to the GPU, whatever
that means).

AMD, on the other hand, doesn't behave like that. Mapping a
D3DPOOL_SYSTEMMEM buffer without the NOOVERWRITE flag does block to
some degree. OTOH, mapping a SYSTEMMEM buffer with NOOVERWRITE is
unsynchronized i.e. updating data used by the draw will affect the
draw results. MANAGED buffers seem to have the same performance
characteristics as SYSTEMMEM WRT maps, including NOOVERWRITE having a
visible effect.

> One test I'd find interesting
> would be to compare the performance characteristics of draws of
> various sizes out of huge MANAGED and SYSTEMMEMORY buffers.

Just one more hack to test_map_synchronization() and there you are :)
I added one more QPC() call before the draw and restructured the test
to create increasingly large buffers, both drawing just from a portion
of the buffer and drawing from the whole buffer. On Nvidia, map time
for D3DPOOL_MANAGED buffers is proportional with the size of the
buffer and not affected by the triangle count. Draw time for
D3DPOOL_SYSTEMMEM, on the other hand, is proportional with the
triangle count and not affected by the buffer size. I think that also
matches our understanding, with the driver only uploading the data
strictly required by the draw. I haven't tested it yet but I assume
that it works similarly for indexed draws, where d3d can exploit the
min vertex index + vertex count to only upload the required subset of
the vertex buffer.
The only other significant change with larger buffers / draws is the
map time for no flags D3DPOOL_DEFAULT buffer maps, which is
proportional to the triangle count. That makes perfect sense, the map
has to wait for the draw to complete. No other draw or map duration
change in a significant manner with larger buffer sizes / triangle
counts.
On AMD, map duration is not measurably affected by buffer size or
triangle count in any buffer pool - flag combination, aside from the
D3DPOOL_DEFAULT no flags case, which blocks until the previous draw is
completed. Not much to see with draw duration either, they are
generally "instant" with DEFAULT and SYSTEMMEM pool buffers and take a
bit longer (on the order of 100 μs) with MANAGED. No significant
changes with different buffer size and triangle count values.

I guess all of this means that applications need to cope with both
behaviors (or, more likely, don't care) and we can probably get away
with pretty much anything.

> From what I have seen in real games (e.g. World of Warcraft, Call of Duty Modern Warfare 2) textures are probably more interesting here than buffers. Both games use UpdateTexture with sysmem, D3DUSAGE_DYNAMIC source textures that they later map with DISCARD. When I worked on the command stream I honored that DISCARD flag, but I never wrote tests to show that it is correct to do so.

Good point. I wrote another quick test and it looks like
D3DPOOL_SYSTEMMEM texture maps never block. Actually the DISCARD flag
seems to be ignored in the case of D3DPOOL_SYSTEMMEM textures, texture
data is unchanged from the previous map. This seems to be the case for
both Nvidia and AMD.
Perhaps interestingly, the UpdateTexture() call also never blocks, as
far as I can see. Nothing surprising otherwise, except that apparently
the readback after a draw seems to take longer if the texture data was
actually changed compared to just mapped and not modified (e.g. it's
pretty consistent at 2 - 2.5 ms vs 3.7 ms on Nvidia). I guess I
shouldn't read too much into it.

To complete testing coverage of DISCARD, I also wrote a test for
buffers. It turns out that DISCARD is ignored for SYSTEMMEM or MANAGED
buffers, the map pointer and the buffer contents are unchanged after
the DISCARD map.

If it's useful I can clean up those tests / hacks a bit and share
them. Otherwise I'm probably going to make proper tests only for the
DISCARD thing (i.e. what's not timing-related).



More information about the wine-devel mailing list