Add documentation on the address space layout in Wine

Fri May 28 00:26:06 CDT 2004

Mike Hearn <mh at codeweavers.com>
Add documentation on the address space layout in Wine

Generated from:
* mike at navi.cx--2004/wine--mainline--0.9--patch-25

--- /dev/null	2003-09-15 14:40:47.000000000 +0100
+++ documentation/address-space.sgml	2004-05-28 00:26:06.000000000 +0100
@@ -0,0 +1,175 @@
+<chapter id="address-space">
+  <title> Address space management </title>
+
+  <para>
+    Every Win32 process in Wine has its own dedicated native process on the host system, and
+    therefore its own address space. This section explores the layout of the Windows address space
+    and how it is emulated.
+  </para>
+
+  <para>
+    Firstly, a quick recap of how virtual memory works. Physical memory in RAM chips is split
+    into <emphasis>frames</emphasis>, and the memory that each process sees is split
+    into <emphasis>pages</emphasis>. Each process has its own 4 gigabytes of address space (4gig
+    being the maximum space addressable with a 32 bit pointer). Pages can be mapped or unmapped:
+    attempts to access an unmapped page cause an EXCEPTION_ACCESS_VIOLATION which has the
+    easily recognizable code of 0xC0000005.  Any page can be mapped to any frame, therefore you can
+    have multiple addresses which actually "contain" the same memory. Pages can also be mapped to
+    things like files or swap space, in which case accessing that page will cause a disk access to
+    read the contents into a free frame.
+  </para>
+
+  <sect1>
+    <title>Initial layout</title>
+  
+    <para>
+      When a Win32 process starts, it does not have a clear address space to use as it pleases. Many pages
+      are already mapped by the operating system. In particular, the EXE file itself and any DLLs it
+      needs are mapped into memory, and space has been reserved for the stack and a couple of heaps
+      (zones used to allocate memory to the app from). Some of these things need to be at a fixed
+      address, and others can be placed anywhere.
+    </para>
+
+    <para>
+      The EXE file itself is almost always mapped at address 0x400000 and up: indeed, most EXEs have
+      their relocation records stripped which means they must be loaded at their base address and
+      cannot be loaded at any other address.
+    </para>
+
+    <para>
+      DLLs are internally much the same as EXE files but they have relocation records, which means
+      that they can be mapped at any address in the address space. Remember we are not dealing with
+      physical memory here, but rather virtual memory which is different for each
+      process. Therefore OLEAUT32.DLL may be loaded at one address in one process, and a totally
+      different one in another. Ensuring all the functions loaded into memory can find each other
+      is the job of the Windows dynamic linker, which is a part of NTDLL.
+    </para>
+
+    <para>
+      So, we have the EXE and its DLLs mapped into memory. Two other very important regions also
+      exist: the stack and the process heap. The process heap is simply the equivalent of the libc
+      malloc arena on UNIX: it's a region of memory managed by the OS which malloc/HeapAlloc
+      partitions and hands out to the application. Windows applications can create several heaps but
+      the process heap always exists. It's created as part of process initialization in
+      dlls/ntdll/thread.c:thread_init().
+    </para>
+
+    <para>
+      There is another heap created as part of process startup, the so-called shared or system
+      heap. This is an undocumented service that exists only on Windows 9x: it is implemented in
+      Wine so native win9x DLLs can be used. The shared heap is unusual in that anything allocated
+      from it will be visible in every other process. This heap is always created at the
+      SYSTEM_HEAP_BASE address or 0x65430000 and defaults to a megabyte in size.
+    </para>
+
+    <para>
+      So far we've assumed the entire 4 gigs of address space is available for the application. In
+      fact that's not so: only the lower 2 gigs are available, the upper 2 gigs are on Windows NT
+      used by the operating system and hold the kernel (from 0x80000000). Why is the kernel mapped
+      into every address space?  Mostly for performance: while it's possible to give the kernel its
+      own address space too - this is what Ingo Molnars 4G/4G VM split patch does for Linux - it
+      requires that every system call into the kernel switches address space. As that is a fairly
+      expensive operation (requires flushing the translation lookaside buffers etc) and syscalls are
+      made frequently it's best avoided by keeping the kernel mapped at a constant position in every
+      processes address space.
+    </para>
+
+    <para>
+      On Windows 9x, in fact only the upper gigabyte (0xC0000000 and up) is used by the kernel, the
+      region from 2 to 3 gigs is a shared area used for loading system DLLs and for file
+      mappings. The bottom 2 gigs on both NT and 9x are available for the programs memory allocation
+      and stack.
+    </para>
+
+    <para>
+      There are a few other magic locations. The bottom 64k of memory is deliberately left unmapped
+      to catch null pointer dereferences. The region from 64k to 4mb are reserved for DOS
+      compatibility and contain various DOS data structures. Finally, the address space also
+      contains mappings for the Wine binary itself, any native libaries Wine is using, the glibc
+      malloc arena and so on.
+    </para>
+    
+  </sect1>
+
+  <sect1>
+    <title> Laying out the address space </title>
+
+    <para>
+      Up until about the start of 2004, the Linux address space very much resembled the Windows 9x
+      layout: the kernel sat in the top gigabyte, the bottom pages were unmapped to catch null
+      pointer dereferences, and the rest was free. The kernels mmap algorithm was predictable: it
+      would start by mapping files at low addresses and work up from there.
+    </para>
+
+    <para>
+      The development of a series of new low level patches violated many of these assumptions, and
+      resulted in Wine needing to force the Win32 address space layout upon the system. This
+      section looks at why and how this is done.
+    </para>
+
+    <para>
+      The exec-shield patch increases security by randomizing the kernels mmap algorithms. Rather
+      than consistently choosing the same addresses given the same sequence of requests, the kernel
+      will now choose randomized addresses. Because the Linux dynamic linker (ld-linux.so.2) loads
+      DSOs into memory by using mmap, this means that DSOs are no longer loaded at predictable
+      addresses, so making it harder to attack software by using buffer overflows. It also attempts
+      to relocate certain binaries into a special low area of memory known as the ASCII armor so
+      making it harder to jump into them when using string based attacks.
+    </para>
+
+    <para>
+      Prelink is a technology that enhances startup times by precalculating ELF global offset
+      tables then saving the results inside the native binaries themselves. By grid fitting each
+      DSO into the address space, the dynamic linker does not have to perform as many relocations
+      so allowing applications that heavily rely on dynamic linkage to be loaded into memory much
+      quicker. Complex C++ applications such as Mozilla, OpenOffice and KDE can especially benefit
+      from this technique.
+    </para>
+
+    <para>
+      The 4G VM split patch was developed by Ingo Molnar. It gives the Linux kernel its own address
+      space, thereby allowing processes to access the maximum addressable amount of memory on a
+      32-bit machine: 4 gigabytes. It allows people with lots of RAM to fully utilise that in any
+      given process at the cost of performance: as mentioned previously the reason behind giving
+      the kernel a part of each processes address space was to avoid the overhead of switching on
+      each syscall.
+    </para>
+
+    <para>
+      Each of these changes alter the address space in a way incompatible with Windows. Prelink and
+      exec-shield mean that the libraries Wine uses can be placed at any point in the address
+      space: typically this meant that a library was sitting in the region that the EXE you wanted
+      to run had to be loaded (remember that unlike DLLs, EXE files cannot be moved around in
+      memory). The 4G VM split means that programs could receive pointers to the top gigabyte of
+      address space which some are not prepared for (they may store extra information in the high
+      bits of a pointer, for instance). In particular, in combination with exec-shield this one is
+      especially deadly as it's possible the process heap could be allocated beyond
+      ADDRESS_SPACE_LIMIT which causes Wine initialization to fail. 
+    </para>
+
+    <para>
+      The solution to these problems is for Wine to reserve particular parts of the address space
+      so that areas that we don't want the system to use will be avoided. We later on
+      (re/de)allocate those areas as needed. One problem is that some of these mappings are put in
+      place automatically by the dynamic linker: for instance any libraries that Wine
+      is linked to (like libc, libwine, libpthread etc) will be mapped into memory before Wine even
+      gets control. In order to solve that, Wine overrides the default ELF initialization sequence
+      at a low level and reserves the needed areas by using direct syscalls into the kernel (ie
+      without linking against any other code to do it) before restarting the standard
+      initialization and letting the dynamic linker continue. This is referred to as the
+      preloader and is found in ld-winepreload.so (loader/preloader.c)
+    </para>
+
+    <para>
+      Once the usual ELF boot sequence has been completed, some native libraries may well have been
+      mapped above the 3gig limit: however, this doesn't matter as 3G is a Windows limit, not a
+      Linux limit. We still have to prevent the system from allocating anything else above there
+      (like the heap or other DLLs) though so Wine performs a binary search over the upper gig of
+      address space in order to iteratively fill in the holes with MAP_NORESERVE mappings so the
+      address space is allocated but the memory to actually back it is not. This code can be found
+      in libs/wine/mmap.c:reserve_area.
+    </para>
+    
+  </sect1>
+  
+</chapter>
--- documentation/wine-devel.sgml
+++ documentation/wine-devel.sgml
@@ -13,6 +13,7 @@
 <!entity ddraw SYSTEM "ddraw.sgml">
 <!entity multimedia SYSTEM "multimedia.sgml">
 <!entity threading SYSTEM "threading.sgml">
+<!entity address-space SYSTEM "address-space.sgml">
 
 <!entity implementation SYSTEM "implementation.sgml">
 <!entity porting SYSTEM "porting.sgml">
@@ -138,6 +139,7 @@
     &implementation;
     &porting;
     &consoles;
+    &address-space;    
     &cvs-regression;
   </part>