Implement pinned memory for ATI where buffer regions are slow

Sun Feb 17 08:40:33 CST 2013

Am 17.02.2013 um 14:25 schrieb Stanislaw Halik <sthalik at misaki.pl>:
> Security impact is nil - Win32 can reference the extension regardless. Well-written code won't expose the issue. Patch is a bit messed up WRT lack of binding back to 0 in some places, but that'll be amended.
I basically agree with this - the problem with this extension is that the driver exports it. If we're using it or not doesn't really have a security impact.

That said, I still prefer to make sure fglrx's ARB_map_buffer_range is crappy and unfixable before using AMD_pinned_memory. To do so, please test the impact of disabling ARB_map_buffer with other games. There are many free games and benchmarks that allow you to do that. And if you have access to a Nvidia GPU or can migrate your system to r600g for testing, test the application with r600g.

> Why is it included for every mapping in the first place, even for programs who never need it?
The problem isn't that glMapBufferRange is used for maps, but that we create a VBO for dynamic buffers instead of drawing from system memory.

Why do we want that? For one, with correct app/driver behavior this is faster. Sysmem draws are not available in OpenGL 3 core contexts. Many drivers(All OSX ones, Nvidia on Linux) are really slow if you mix VBOs and sysmem(e.g. when a static buffer and a dynamic buffer are mixed during draws).

> Why not do a provisionary solution if issue persisted since 1.3.x? Not to mention performance gains. All I want is to fix a regression so there's no need to maintain a local patch set.
Because we try really hard to avoid hacky workarounds. Once you put one in for one app, hundreds of users will demand one for their app/driver combination, and before you know it you'll just be maintaining hacks vs making Wine and the drivers better.

Regarding your pinned_memory patch:
As I understand the extension, it is enough to allocate the initial memory with GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD as bind target. This is some strange API design and needs confirmation, but ok. You shouldn't have to re-allocate the buffer every time you change it.

So in essence, allocate the buffer with GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD in buffer_create_buffer_object. Set SFLAG_DOUBLEBUFFER (to preserve HeapMemory and skip GL buffer mapping / unmapping in map() and unmap()). Set a new flag SFLAG_AMD_PINNED, and skip recording dirty areas in map() if this is set. Only do this with D3DUSAGE_DYNAMIC buffers, regular buffers shouldn't use PINNED_MEMORY at all.

The tricky part is handling DISCARD maps, and maps that don't pass DISCARD or NOOVERWRITE. For maps where the app sets neither flag, use the fence code that's in place for APPLE_flush_buffer_range. For DISCARD maps either use the fences as well (slow), or allocate a buffer twice the size, and switch between the first and second half of the allocated memory area on DISCARD maps. You still have to use fences (one per buffer copy), but ideally the GPU will be done reading the first buffer copy by the time the second copy is full and you have to go back to the first one.