Implementing AllocateUserPhysicalPages (Idea)

Stefan Dösinger stefandoesinger at gmail.com
Sat Apr 17 09:40:03 CDT 2021


Hi,

I am not familiar with these functions (I just read the MSDN pages); It
seems their intention is to allow 32 bit processes to handle more than
2/3/4 GB of memory. Do you have an application that actually uses these
functions and isn't happy with the stubs?

I wonder what the difference is compared to
CreateFileMapping(INVALID_HANDLE_VALUE) + MapViewOfFile. This supposedly
creates a memory-backed file handle and the process can map/unmap it at
will (and even pass it to a different process). I see a difference in
wording ("backed by the system paging file" vs "physical memory"), but
unless you peek into the kernel internals I don't think an application
should notice the difference.

Regarding allocating and mapping memory in foreign processes,
NtAllocateVirtualMemory() can do this as well. It doesn't have the
functionality to create a memfd-like allocation, but you can look at how
the cross-process mapping works via wineserver.

Wrt the exact semantics of those physical page allocations I guess it
depends on what applications actually need. memfd sounds fine, and we
can ignore the swap/mlock until an application actually bothers about it.

I found bug 36527, but it is marked as fixed with the patch that
implemented the stubs. It isn't clear to me if the games and office
diagnosis service are working correctly with the stub or not.

Cheers,
Stefan


PS: I also like the seeming fallback to DOS days in the
AllocateUserPhyiscalPages description:

"Do not attempt to modify this buffer. It contains operating system
data, and corruption could be catastrophic."

Am 17.04.21 um 09:50 schrieb Keith Cancel:
>  I hope this email is not too long, but linux does not does not quiet
> have the functionality of these functions:
> 	AllocateUserPhysicalPages()
> 	MapUserPhysicalPagesScatter()
> 	MapUserPhysicalPages()
> 	FreeUserPhysicalPages()
> 
> I noticed that they were stubs, and thusly unimplemented. However, I
> have an idea on how to implement these functions. However, I am not
> entirely familiar with the code wine code base, but nonetheless I will
> explain my idea. Let's start with AllocateUserPhysicalPages() it needs
> to do a few things:
> 	* Another process needs to be able to reserve pages for an other
> process (aka the HANDLE hProcess parameter)
> 	* Reserve memory without being added to the virtual address space of
> caller or target process.
> 	* Said memory is locked aka mlocked() and won't be swapped.
> 	
> So first thing is any windows process with permission to do so, needs
> to be able to reserve memory for other proccess including itself. It
> makes sense have what currently is stub invoke what I will describe
> next in the Wine Server. This is so when the target process tries
> mapping the reserved memory it can fetch the info it needs from the
> Wine server. So the first thing the handler in the Wine server would
> need to do is:
> 	* Check that invoking windows process is allowed to do this. (aka
> SeLockMemoryPrivilege, and PROCESS_VM_OPERATION on the handle) or just
> grant all processes this ability.
> 	* Check that the target windows process exists.
> 
> However, now we need to reserve memory that can be mapped and unmapped
> multiple times in the target process without losing the memory
> contents. The simplest way to do this would be a file. However, a file
> resides on disk and when unmapped can be slow when remapped. It also
> has potential to pollute the file system if cleanup fails. However,
> luckily linux has the memfd_create() system call. This creates a RAM
> backed file, and returns a file descriptor that then later can be
> passed to mmap(). This lets create a persistent bit of memory that
> does not pollute either the caller or targets address space. I assume
> there is a per process structure in the Wine server we could store
> this file descriptor there. We can also just use ftruncate() to set
> the size to be equal to the number of bytes requested. However,
> FreeUserPhysicalPages() makes things a tad more complicated, it can
> for instance be used to free only a single page worth of memory. So we
> need to track free pages size chunks. Luckily, we are only tracking
> fixed size memory blocks so it's not as bad as it could be. So in
> C-ish pseudo code do in the following for AllocateUserPhysicalPage():
> 
>     remaining_pages  = NumberOfPages; // NumberOfPages is a function parameter
>     proc_struct      = get_process_struct(win_process_id/handle); //
> If there is a structure like this get it
>     page_array_index = 0; // Track what index we have written to
>     if(proc_struct.memfd == -1) {
>     	proc_struct.memfd          = memfd_create("debug name", {FLAGS});
>     	proc_struct.free_list_head = NULL;
>     }
>     // Check the free list since FreeUserPhysicalPages() can free for
> instance a single page
>     next_ptr = proc_struct.free_list_head;
>     while(next_ptr != NULL) {
>         // PageArray is also an other function parameter
>     	PageArray[page_array_index] = {However the IDing is done for pages};
> 
>     	next_ptr = next_ptr->next;
>     	proc_struct.free_list_head = next_ptr;
> 
>     	page_array_index++;
>     	remaining_pages--;
>     }
> 
>     if(remaining_pages > 0) {
>     	old_size = proc_struct.memfd_sz;
>     	new_size = old_size + (page_sz * remaining_pages);
>         proc_struct.memfd_sz = new_size;
>     	ftruncate(proc_struct.memfd, new_size);
>     	for(int i = 0; i < remaining_pages; i++) {
>     		// calclate ID start from the old_size and increment it up
>     		PageArray[page_array_index] = {However, IDing is done for pages};
>     	}
>     }
>     // Assuming no failures we don't need to update the NumberOfPages value.
> 
> So now obviously one needs to be identify what windows referees to as
> frame number for these API, that get returned as array for each page
> in the PageArray parameter. I propose the uppers bits be the process
> ID and the lower bits be position on in the file descriptor aligned to
> page boundary size and shifted. So 4096 bytes pages [Process ID |
> (aligned_index >> 12)]. I know a Linux process ID can only be
> configured to be up 22 bits on a 64bit system. While the windows
> process ID is likely different 64 - 22 = leaves 42 bits to identify a
> given page for a process.
> 
> Lastly, although I am not sure it's necessary
> AllocateUserPhysicalPages() implies the pages are locked to RAM. The
> memfd files from my understanding can be swapped. We could mmap this
> file descriptor into the wine server memory while setting MAP_SHARED
> flag, and then call mlock() on this mapping in the Wine server to
> ensure it's never swapped out. We could also use the unused page
> boundaries to store the free list. However, this also will eat up
> address space of the Wine Server.
> 
> 
> So now let's discuss the MapUserPhysicalPages() function, this in some
> regards is simpler. It can only be called from the process that is
> mapping the pages. It needs to do the following:
>     * Get the file descriptor from the Wine Server
>     * Start at the provided virtual address
>         * Check if the our page IDs in the PageArray make sense. (aka
> ID is not in free list, and process ID matches)
>         * Check that each page is in already mapped region of memory.
>         * Then mmap-inng each page size chunk referenced in the page
> array sequential starting from virtual address
>             * Each mmap should keep the same permissions as what the
> page at the address had before.
> 
> Next MapUserPhysicalPagesScatter() is mostly the same as
> MapUserPhysicalPages(), but instead we handle an array of
> VirtualAddresses that gets mapped to each page in the page array.
> 
> The last function is FreeUserPhysicalPages(). Again this one can be
> called by any process since it takes a process handle.
>     * Don't do anything to any chunks that are mmap()-ed, this implies
> we need to keep a reference count or some how check that a process
> does not have this page size chunk mapped.
>     * If freeing a page would create a whole in the middle of memory
> backed file add it to the free list
>         * Zero this page size chunk
>     * If the page size chunk is at the end or or all the pages size
> chunks after it are also free
>         * ftruncate() the file the memory back filed down in size
> 
> While this mostly was kinda a broad overview, I hope it gives someone
> a good idea of where to jump start implementing these functions. I did
> think about it for a little while since linux does not quite have the
> same functionality.
> 
> Thanks,
> Keith Cancel
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20210417/3a25f001/attachment.sig>


More information about the wine-devel mailing list