Access to graphics memory mappings on upcoming WOW64 implementation

Mon Apr 25 00:31:51 CDT 2022

On 4/24/22 21:18, Derek Lesho wrote:
> Hi All,
> 
> In the wake of the new WOW64 implementation (recent explanation [1]), 
> there has been discussion in informal channels about how to we are going 
> to handle pointers to mapped graphics resource memory which we receive 
> from the graphics API, as the possibility exists that it will fall 
> outside of the 32-bit address space.
> 
> Over time, a few creative solutions have been proposed and discussed, 
> with a common theme being that we need changes in either the kernel or 
> the graphics drivers to do this properly.  As we already know the 
> requirements for a solution to this problem, I think it would be 
> responsible to hash this out now and then work with the relevant project 
> maintainers earlier as to avoid blocking work on the wine side too long 
> and to possibly allow more users to test the new path earlier.

Thank you for starting this conversation! I agree with all of these 
points. WoW64 emulation is still a long way off, if it'll even happen by 
default on platforms other than Mac, but nevertheless this is something 
we should look into supporting sooner than later.

It would probably be good to start a dri-devel/mesa-dev thread to 
discuss this as well.

> 
> The solutions which I've seen laid out so far:
> 
> - Use the mremap(2) interface, allowing us to duplicate the mapping we 
> receive into the 32-bit address space.  This solution would match what 
> is already done for Crossover Mac's 32on64 support using Mac's 
> mach_vm_remap functionality [2].  However, right now it is not possible 
> to use the MREMAP_DONTUNMAP flag with mappings that aren't private and 
> anonymous, which rules out there use on mapped FDs from libdrm.  Due to 
> this, a kernel change would be necessary.
> 
>      Pro: A uniform solution across all APIs, which could help in the 
> future with any unforeseen need to access host-allocated memory in 
> 32-bit windows code.
> 
>      Cons: Requires a kernel change, which of all the options may take 
> the longest to get up-streamed and in the hands of users.

Frankly, I think it may be worth looking into this even if we do try to 
implement another solution for GPU mappings specifically. As you say, it 
may potentially come in useful in other places.

In fact, in general I think looking into multiple solutions, and being 
able to fall back from one to another, is not necessarily a bad idea.

Also: it may be worth looking into kernel extensions other than 
mremap(2). We already have to deal with the problem of reserving the low 
2 GB for Win32 memory, and our current solutions to that can cause 
problems (I was recently bitten by this, in bug 52840 [1]).

A personality switch or pair of switches like "map everything under 2/4 
GB" and "prefer mapping above 2/4 GB" would be helpful, so that we can 
force mapping under 2 GB in NtAllocateVirtualMemory() and GPU mappings 
and above 2 GB otherwise. Unlike extending mremap(2), these would be 
useful for normal allocations as well, i.e. they'd allow us to do a 
better job of placing system libraries where we want them.

See also below s.v. ADDR_LIMIT_32BIT.

[1] https://bugs.winehq.org/show_bug.cgi?id=52840

> 
> - Work with Khronos to introduce extensions into the relevant APIs 
> enabling us to tell drivers where in the address space we want resources 
> mapped.
> 
>      Pro: Wouldn't require going around the backs of the driver, 
> resulting in a more hardened solution.  (Out there, but what if a 
> creative driver returns a mapping without read or write permission and 
> handles accesses through a page fault handler?)
> 
>      Cons: The extension would have to be implemented by each individual 
> vendor for every relevant API.  This would implicitly drop support for 
> those with cards whose graphics drivers are no longer being updated.
> 
> - Hook the driver's mmap call when we invoke memory mappings function, 
> overriding the address to something in the 32-bit address space.
> 
>        Pro: Unlike the other solutions, this wouldn't require any 
> changes to other projects, and shares the advantage of the first solution.
> 
>        Cons: Susceptible to breakage if the driver uses their own 
> mapping mechanism separate from mmap.  (Custom IOCTL, CPU driver 
> returning something from the heap)
> 

Here's a few other ideas / considerations I think are worth mentioning:

- Reserve the entire address space above 2G (or 3G with the appropriate 
image flags). This is essentially what we already do for 32-bit 
programs. I'm not sure if reserving 2**48 bytes of memory will run into 
problems, though? Has this been tried?

- Linux has a personality(2) switch ADDR_LIMIT_32BIT. The documentation 
is terse, so I'm not fully sure what this does, but it might be 
sufficient to ensure that new mappings are placed under 2 GB, while not 
breaking old mappings? And presumably it's also toggleable. It's not 
ideal exactly—we'd like to be able to set a 3 GB or 4 GB limit instead 
if the binary allows—but it's potentially already usable.

- We can emulate mappings for everything except coherent memory by 
manually implementing mapping functions with a separate sysmem location. 
We can implement persistent mappings this way, too, by copying on a 
flush, but unfortunately we can't expose GL_ARB_buffer_storage without 
coherent mappings.

   [Fortunately d3d doesn't require coherent memory or 
ARB_buffer_storage, and the Vulkan backend doesn't require coherent 
memory for map acceleration. The GL backend currently does, but could be 
made not to. We'd have to add a private extension to use 
ARB_buffer_storage while not actually marking any maps as coherent. Of 
course, d3d isn't the only user of GL or Vulkan, and unfortunately 
ARB_buffer_storage is core in 4.3, so I'm sure there are GL applications 
out there that rely on it...]

   I think we can actually emulate coherent memory as well, by tracking 
resource bindings and manually flushing on draws. That's a little 
painful, though.

- Crazy idea: On Linux, parse /proc/self/maps to allow remapping 
non-anonymous pages. Combined with mremap(2) or manual emulation, this 
allows mapping everything except for shared anonymous pages [and I can't 
imagine that a GPU driver would use those, especially given that the 
only way to make use of the SHARED flag is fork(2)].

ἔρρωσθε,
Zeb