32bit Wine on macOS Catalina

Sat May 14 19:29:21 CDT 2022

I started out trying to pedantically clarify the question about "change 
in executable format", and ended up writing yet another summary of PE 
conversion. Hopefully someone will find this useful...

On 5/14/22 17:49, Charles Davis wrote:
>> How does a change in
>> executable format get around the WindowServer limitation?
> 
> Simple: Wine will now run 32-bit programs as a 64-bit process, using
> 64-bit system interfaces. This is, in fact, how Windows itself
> supports 32-bit programs on a 64-bit kernel.
> 
> The program-facing parts of Wine (*.dll files) will be in PE32 format.
> But the host-facing parts (*.so files) will still be in the native
> binary format of the host system--in the case of Mac OS, that's 64-bit
> Mach-O. The PE parts enter the Unix parts by making a "syscall" (i.e.
> not a real host syscall, just a sequence that looks a lot like an NT
> syscall to user-mode code), so from the program's point of view, all
> the host-specific code runs in "kernel mode." 32-bit programs are
> supported the same way they are in Windows, namely, with the
> Windows-on-Win64 subsystem, using thunks inside special WOW64 DLLs
> (wow64*.dll).
> 
>
It might be worth pointing out, just to further clarify Chip's answer, 
that the change in executable format does not *per se* get around any 
limitations.

What gets around the limitation is the ability to load a 32-bit code 
segment into the program, use it to run 32-bit application code, and 
then switch back to the 64-bit segment in order to call into 64-bit host 
libraries (such as, in this case, Quartz).

Using the PE executable format is not actually necessary to accomplish 
this (even for proper WoW64 support, I believe—though someone should 
correct me if I'm wrong. At least it's not necessary for obvious 
reasons.) However:

* We need to define a boundary in Wine between 32-bit and 64-bit code. 
Switching between them requires a far jump (a.k.a. ljmp). We need to 
choose what code is compiled as 32-bit, and what code is compiled as 64-bit.

* Unrelatedly, some programs expect Win32 DLLs to be in PE format on 
disk, usually for the purposes of digital restrictions management or 
anti-tamper. On the other hand, we need some code to be in host 
(.so/.dylib) format, so that it can link to other host code. [At the 
very least we need dlopen().]

* At the same time, and still unrelatedly, there are a large number of 
more minor differences between what Win32 programs expect about their 
execution environment, and what Unix libraries expect about their 
execution environment:

   (a) Win32 programs expect the TEB to be in %fs or %gs (depending on 
architecture), but Unix C libraries tend to use it for their own 
per-thread data. glibc mostly coöperates with us in this respect by 
using whichever register Windows doesn't use. However, Mac libc does not 
(it does leave some commonly used TEB offsets untouched, but that has 
not always been enough). Additionally, bug 47198 concerns a rather 
creative anti-cheat that effectively means we need to reserve *both*.

   (b) Win32 programs manually specify the amount of stack they use, or 
even set up stacks manually by changing the %esp register (Cygwin is a 
primary offender). However, Unix libraries may demand much larger 
stacks. We get around this currently by allocating larger stacks than 
the Win32 program requests if possible. [I'm not sure I know of any 
cases where this isn't enough or causes problems?]

   (c) Win32 programs expect that the stack will be committed only by 
touching the guard page, whereas Unix libraries don't expect that they 
need to do this. We get around this by committing the whole stack from 
the beginning, but bug 47808 is caused by Cygwin allocating its own 
stack which is *not* fully committed. Both this, and the above, can be 
solved by having a separate Unix stack, and executing Unix code on that 
stack.

   (d) Win32 programs like debuggers can insert breakpoints anywhere, 
or, more importantly, break a running program regardless of where it's 
running. For various reasons, this causes problems if the thread is in 
the middle of Unix code. I don't know of any public bug reports for 
this, but there's a demand for using native debuggers such as Visual 
Studio, at least enough that CodeWeavers is interested in fixing it. 
This can be solved by effectively masking off suspend requests while 
inside of a Unix call.

The ultimate effect is that, for various reasons, we want to define 
boundaries, and it ends up actually making sense to make all of these 
boundaries the *same* boundary. There are a few reasons for this:

* Perhaps most importantly, all of the above require some nontrivial 
thunking. We need to do some work change from PE code to .so code, or 
from 32-bit code to 64-bit code, or from Win32 code to Unix code. In 
particular:

   - changing from 32-bit code to 64-bit code requires a far/long jump, 
as has been stated,

   - changing from PE code to .so code requires some glue to determine 
*where* to jump to, as you can't just link from one to the other;

   - changing from Win32 code to Unix code requires swapping %fs and/or 
%gs, switching stacks, marking down that suspend requests are masked, etc.

We need to define where these transitions take place so that we can 
perform that thunking, and ultimately that definition takes some 
nontrivial effort. We need to manually write thunks for every function 
that is thunked. If we can write the thunk once, and do all of the above 
transitions at the same time, that saves us considerable effort.

* It ends up making a lot of sense to think of Win32 code as "user" 
code, and Unix code as "kernel" code, especially when considering the 
problem of debuggers. A user-space debugger should be allowed to mess 
with any user-space code, but should effectively have suspend requests 
masked during kernel code, which matches what we want quite well. 
Similarly, real kernels will change stacks to execute kernel code.

* "User" code can always be expressed in PE format—after all, it only 
needs to link to other user libraries, plus the glue to the "kernel". On 
the other hand, it's not easy for Win32 programs to actually access 
kernel code in order to validate it against the on-disk form, so we 
don't actually need "kernel" code to be in PE format. If we write 
"kernel" code such that it never needs to call into a "user" library 
(which ends up being relatively easy), we can compile it in .so format 
and have it link to Unix libraries.

* The split between user and kernel code is *basically* done at the same 
place as the split between 32-bit and 64-bit code on Windows. There's a 
bit here that's handwaved, but it requires a detailed explanation of how 
WoW64 works on Windows.

The ultimate effect is that at this point, what's called "PE conversion" 
really only has *partially* to do with the PE format, and what's 
involved in the process of PE conversion is mostly defining the above 
thunks, and making sure that all code is on one side of the split or the 
other. The other part is, of course, writing the generic thunking 
code—the parts that switch segments and stacks, set up the dynamic 
library glue, and so on.

The "writing the generic glue" part is, at this point, almost entirely 
done. The "splitting up the code and writing thunks" is *mostly* done 
[considering where we started, it's been quite a long journey!] but 
there's still some bits left, and of course they are some of the 
trickiest parts of the Wine code base to split up.

Hope this was helpful for someone. I know Erich wrote a similar writeup, 
but there were a couple of missing bits in that I wanted to clarify as well.

ἔρρωσθε,
Zeb