Project: x86 to ARM binary translator

Yale Zhang yzhang1985 at gmail.com
Fri Apr 8 04:16:00 CDT 2011


Thanks everyone for their comments. I took some time to reread the FX!32 and
Transmeta Crusoe publications (I 1st read them 3 years ago while I was at
Georgia Tech) to see what the challenges are.

The simplest approach is what Stefan proposed:  run the Windows app inside
x86 wine, inside of QEMU (target = x86, host = arm). Pretty clever, but the
2 layers of translation, instead of 1 layer, might cause problems. Also, I
said earlier, I don't think QEMU's code generator produces fast enough code,
so that will need to be improved (no change to Wine). I will try it and see
what happens.

The 2nd approach, which is almost identical to FX!32 (runs x86 Windows
programs on Alpha Windows), will be to do what Stefan proposed 2nd:  create
a stand alone process VM to run x86 Windows apps on ARM Windows, using
wrappers to translate x86 Windows functions to ARM Windows functions. I
think those wrappers/jackets can be generated automatically by scanning
header files.

I still don't like this approach due to doing the API translation and
instruction set translation in 2 separate programs. Ideally, I would take
the Darwine approach of doing both API translation and binary translation
both in Wine.

To me, the API translation is less interesting than doing x86 to ARM
translation efficiently. I said earlier, QEMU's approach of translating
target instructions => micro ops => host instructions is inefficient due to
generating redundant operations.

1. Transmeta code morphing software
       no emulation of x86 instructions: always translates to native
instructions (though not always with optimizations). Hot code is
retranslated with optimizations.
2. FX!32
       first emulates x86 instructions, then picks candidates for
translation to Alpha instructions

I'm tempted to do a quick and dirty x86 to ARM translation for cold code
that isn't a candidate for optimization. But since any non-trivial code
transformation/optimization is best done on a *simple* intermediate
representation, I will have use an intermediate representation for hot code
that needs to be optimized.

But writing a direct x86 to ARM translator will be a lot of work and not
portable to other targets (resurgent MIPS ?)

Therefore, another approach would be to use QEMU as is, but use LLVM
optimizations for hot candidates like was done earlier in a Google SOC
project. This will be very slow on a the 1st run of the program, but a
persistent translation cache like FX!32 and .NET assembly uses, will make
subsequent executions much faster. The static persistent translation won't
be complete however, due to unknown indirect branches, so it will keep
growing. I think the main reason FX!32 uses a persistent translation cache
is because it uses emulation, which would be otherwise intolerable if done
on every application launch.

Other issues:

x86 condition flag evaluation - I want to do this lazily, but how do I know
the liveness of those values (given an instruction that uses a condition
flag, how do I find the instruction that generates the condition flag)?


Stefan,
---------------------------------------------------------------------------------------------
"ARM doesn't need [dealing with unaligned loads/stores], but PPC does"

OK, good to know. Earlier, I thought ARM didn't allow unaligned loads/stores
at all, but apparently ARM6+ does.

Andre,
-----------------------------------------------------------------------------------------------

"Maybe first to ARM Linux and then to ARM Android?"

Yes, if I can figure out how to install Ubuntu onto my Xoom. I saw someone
do it here <http://www.youtube.com/watch?v=xDB0PMrGdN0>, but he's just
running the userspace part of Ubuntu on top of Android, so I'm not sure if
that will be as compatible as running a native Linux kernel

"I know about that and was told it was never implemented because of problems
with the endianess"

Right, if the endians are different and required byte swapping on every
load/store, that will kill the performance. Luckily ARM
can operate in both little and big endian.

Damjan,
 -------------------------------------------------------------------------------------------------

In theory, binary translation will allow Flash Player, Java JVM, to run on
ARM, but there might be complications because those programs generate and
execute x86 code.

Also, I agree improving QEMU binary translation would be the simplest
approach, but like I said earlier, I get a feeling that doing API
translation and instruction set translation in 2 separate programs,
might cause problems.


                    Yale


On Sat, Apr 2, 2011 at 9:06 AM, Damjan Jovanovic <damjan.jov at gmail.com>wrote:

> On Sat, Apr 2, 2011 at 2:19 AM, Yale Zhang <yzhang1985 at gmail.com> wrote:
> > Fellow developers,
> > I'm thinking of starting a VM project to allow running x86 Windows apps
> on
> > ARM Android. This will obviously involve binary translation. I've read
> about
> > QEMU's  tiny code generator and think for a usable experience,
> >  the intermediate micro-op representation will have to be abandoned, and
> use
> > a more efficient, though less portable x86 to ARM translator. I also saw
> > some Google SOC project that tried to incorporate LLVM into QEMU, but
> with
> > disastrous slow down if done naively. I still think it's worth to do so,
> but
> > lots of care will need to be done to only optimize code that needs it
> like
> > Sun's HotSpot Java compiler does.
> > Questions:
> > 1. How useful would this be and how much interest?
> >    Obviously, this will be a huge project, and I just want to gauge the
> > interest before I jump in. Microsoft will be releasing Windows for ARM
> soon,
> > so there will be no need to worry about
> >    running Office, Matlab, Visual C++, etc on ARM, leaving only legacy
> > applications and games to benefit from binary translation. I'm mostly
> > interested in seeing some 3D games run on my
>
> I would love such a project and am willing to help. Good x86 on ARM
> emulation is essential, and not just for Wine: Flash doesn't work on
> ARM, Java (in the form of OpenJDK) doesn't support ARM yet, there's
> the MPlayer win32codecs, etc.
>
> Complete and correct x86 emulation is mighty difficult. The total
> number of all 16/32/64/MMX/SSE instructions (as seen by the udis86
> disassembler) is 710(!!). This is excluding instruction prefixes which
> change what instruction do (eg. 16 vs 32 bit memory access). When last
> I checked, qemu didn't support all of those instructions.
>
> >    Xoom.
> > 2. What's the best design:  whole system VM (qemu) or process VM (qemu &
> > wine)?
> > Process VM:
> > + easier to incorporate 3D acceleration at API level
> > + uses less memory
> > + better performance (e.g. no need for MMU translation when accessing
> > memory)
> > + much better integration with host OS
> > - needs to maintain custom Windows API implementation (Wine)
>
> * To get 3D acceleration, user-space x86 X/OpenGL drivers would have
> to be able to talk to the ARM kernel driver for that graphics card, or
> you'd need x86 to ARM wrappers for X and OpenGL libraries, or you'd
> need to use x86 kernel driver and do x86 emulation in the kernel too
> (very hard), or do whole system VM and the kind of 3D acceleration
> passthrough that VirtualBox does at the moment (which works poorly, in
> my limited experience). NVidia's ioctls are undocumented IIRC, so even
> if they provide an ARM port, translating those between x86 and ARM
> might be difficult.
>
> > Whole system VM:
> > + simpler, more unified to implement
> > + much better support for apps that are dependent on new, proprietary,
> > obscure Windows libraries, interfaces    (moot because Office, Matlab,
> etc
> > will soon be available for ARM)
>
> * poor integration with native desktop/filesystem
> * more to emulate -> slower
>
> > Given the aims of only running legacy applications and games, it seems a
> > foregone conclusion that Wine's process VM approach is best. Comments?
>
> Agree, but it doesn't have to be done as part of Wine. What Darwine
> did - IIRC try to make Wine DLLs PowerPC based and only the
> application x86 - seems like a bad idea: the application/Windows API
> split is badly defined and many things (eg. COM) are
> difficult/impossible to do correctly. I prefer qemu's approach: all
> user-space is x86, only the kernel is ARM.
>
> qemu-i386 doesn't even run 32 bit Wine on amd64 long mode at the
> moment (segfault on startup), I'll have to investigate at some stage.
>
> > 3.  Will Wine ever incorporate binary translation?
> >    My proposed design will obviously use Wine's implementation of the
> > Windows API, which is huge. I'm not sure how disruptive of a change
> binary
> > translation will be to Wine.
> >
> >    If Wine does incorporate binary translation, maybe they can change the
> > name to Wine Is Now an Emulator
> >
> > If your're interested in this project, please reply.
>
> Replying.
>
> The best way to go here would probably be improving qemu. If it turns
> out not to be good enough, rewriting the CPU emulation but keeping the
> system call translation is probably easier than a whole new project
> written from scratch.
>
> Damjan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20110408/4a5e7da3/attachment-0001.htm>


More information about the wine-devel mailing list