Implementing AllocateUserPhysicalPages (Idea)

Keith Cancel admin at keith.pro
Sat Apr 17 02:50:24 CDT 2021


 I hope this email is not too long, but linux does not does not quiet
have the functionality of these functions:
	AllocateUserPhysicalPages()
	MapUserPhysicalPagesScatter()
	MapUserPhysicalPages()
	FreeUserPhysicalPages()

I noticed that they were stubs, and thusly unimplemented. However, I
have an idea on how to implement these functions. However, I am not
entirely familiar with the code wine code base, but nonetheless I will
explain my idea. Let's start with AllocateUserPhysicalPages() it needs
to do a few things:
	* Another process needs to be able to reserve pages for an other
process (aka the HANDLE hProcess parameter)
	* Reserve memory without being added to the virtual address space of
caller or target process.
	* Said memory is locked aka mlocked() and won't be swapped.
	
So first thing is any windows process with permission to do so, needs
to be able to reserve memory for other proccess including itself. It
makes sense have what currently is stub invoke what I will describe
next in the Wine Server. This is so when the target process tries
mapping the reserved memory it can fetch the info it needs from the
Wine server. So the first thing the handler in the Wine server would
need to do is:
	* Check that invoking windows process is allowed to do this. (aka
SeLockMemoryPrivilege, and PROCESS_VM_OPERATION on the handle) or just
grant all processes this ability.
	* Check that the target windows process exists.

However, now we need to reserve memory that can be mapped and unmapped
multiple times in the target process without losing the memory
contents. The simplest way to do this would be a file. However, a file
resides on disk and when unmapped can be slow when remapped. It also
has potential to pollute the file system if cleanup fails. However,
luckily linux has the memfd_create() system call. This creates a RAM
backed file, and returns a file descriptor that then later can be
passed to mmap(). This lets create a persistent bit of memory that
does not pollute either the caller or targets address space. I assume
there is a per process structure in the Wine server we could store
this file descriptor there. We can also just use ftruncate() to set
the size to be equal to the number of bytes requested. However,
FreeUserPhysicalPages() makes things a tad more complicated, it can
for instance be used to free only a single page worth of memory. So we
need to track free pages size chunks. Luckily, we are only tracking
fixed size memory blocks so it's not as bad as it could be. So in
C-ish pseudo code do in the following for AllocateUserPhysicalPage():

    remaining_pages  = NumberOfPages; // NumberOfPages is a function parameter
    proc_struct      = get_process_struct(win_process_id/handle); //
If there is a structure like this get it
    page_array_index = 0; // Track what index we have written to
    if(proc_struct.memfd == -1) {
    	proc_struct.memfd          = memfd_create("debug name", {FLAGS});
    	proc_struct.free_list_head = NULL;
    }
    // Check the free list since FreeUserPhysicalPages() can free for
instance a single page
    next_ptr = proc_struct.free_list_head;
    while(next_ptr != NULL) {
        // PageArray is also an other function parameter
    	PageArray[page_array_index] = {However the IDing is done for pages};

    	next_ptr = next_ptr->next;
    	proc_struct.free_list_head = next_ptr;

    	page_array_index++;
    	remaining_pages--;
    }

    if(remaining_pages > 0) {
    	old_size = proc_struct.memfd_sz;
    	new_size = old_size + (page_sz * remaining_pages);
        proc_struct.memfd_sz = new_size;
    	ftruncate(proc_struct.memfd, new_size);
    	for(int i = 0; i < remaining_pages; i++) {
    		// calclate ID start from the old_size and increment it up
    		PageArray[page_array_index] = {However, IDing is done for pages};
    	}
    }
    // Assuming no failures we don't need to update the NumberOfPages value.

So now obviously one needs to be identify what windows referees to as
frame number for these API, that get returned as array for each page
in the PageArray parameter. I propose the uppers bits be the process
ID and the lower bits be position on in the file descriptor aligned to
page boundary size and shifted. So 4096 bytes pages [Process ID |
(aligned_index >> 12)]. I know a Linux process ID can only be
configured to be up 22 bits on a 64bit system. While the windows
process ID is likely different 64 - 22 = leaves 42 bits to identify a
given page for a process.

Lastly, although I am not sure it's necessary
AllocateUserPhysicalPages() implies the pages are locked to RAM. The
memfd files from my understanding can be swapped. We could mmap this
file descriptor into the wine server memory while setting MAP_SHARED
flag, and then call mlock() on this mapping in the Wine server to
ensure it's never swapped out. We could also use the unused page
boundaries to store the free list. However, this also will eat up
address space of the Wine Server.


So now let's discuss the MapUserPhysicalPages() function, this in some
regards is simpler. It can only be called from the process that is
mapping the pages. It needs to do the following:
    * Get the file descriptor from the Wine Server
    * Start at the provided virtual address
        * Check if the our page IDs in the PageArray make sense. (aka
ID is not in free list, and process ID matches)
        * Check that each page is in already mapped region of memory.
        * Then mmap-inng each page size chunk referenced in the page
array sequential starting from virtual address
            * Each mmap should keep the same permissions as what the
page at the address had before.

Next MapUserPhysicalPagesScatter() is mostly the same as
MapUserPhysicalPages(), but instead we handle an array of
VirtualAddresses that gets mapped to each page in the page array.

The last function is FreeUserPhysicalPages(). Again this one can be
called by any process since it takes a process handle.
    * Don't do anything to any chunks that are mmap()-ed, this implies
we need to keep a reference count or some how check that a process
does not have this page size chunk mapped.
    * If freeing a page would create a whole in the middle of memory
backed file add it to the free list
        * Zero this page size chunk
    * If the page size chunk is at the end or or all the pages size
chunks after it are also free
        * ftruncate() the file the memory back filed down in size

While this mostly was kinda a broad overview, I hope it gives someone
a good idea of where to jump start implementing these functions. I did
think about it for a little while since linux does not quite have the
same functionality.

Thanks,
Keith Cancel



More information about the wine-devel mailing list