Categories
Benchmark Box64 Box86 macM1

Box86/Box64 vs QEMU vs FEX (vs Rosetta2)

Comparing performances

I decided to compare the performances of the OpenSource Linux Userspace Emulator that allows you to run x86/x86_64 apps on ARM linux. There are QEMU-user, FEX-emu and Box86/Box64.

How to bench Linux userspace emulator

Test will consist of the bench I already used a couple of time, and that can be run as native or emulated:

  • 7-zip integrated benchmark, that contains mostly integer code (no x87 or SSE), and can be used as a baseline to see the pure x86 code translation efficiency. The version 16 present in Ubuntu was used for those tests.
  • dav1d, an opensource video transcoding tool, that includes hand-optimized SSE assembly code (SSE3 or more).
  • glmark2 that is GL limited and should run at mostly native speed (as long as GL is hardware accelerated). I couldn’t install the armhf version of glmark2 on Ubuntu, so only the native 64bits version was benchmarked.
  • openarena, that contains x87 code, and a JIT, and, in that config, is very much GPU limited, and so should be running very close to the native speed (again, as long as GL is still hardware accelerated).

The 7z, dav1d and glmark2 bench are described here, and the openarena one there.

I’ll also do some quick bench not available natively. The fps will simply be measured with `HUD_GALLIUM=fps` on a stable and reproducible moment in the game:

  • WorldOfGoo: The game has simple graphics, it should run fine. Measures will be done on the Title screen.
  • FTL: that I added to the bench after doing the QEMU measures. Measures will be done on the 1st Tutorial screen, while the game is paused.
  • CINEBENCH r15: A benchmark based on a raytracing engine. Lots of SSE (SSE2 and more) code here. Use multi-core also. Does include a CPU bench and OpenGL bench, but only the CPU one is used here. It provides a simple number indicating the performance (the higher, the better).

To install WorldOfGoo, the “uname” trick will be used, as this allows to choose x86 or x86_64 installation (without the trick, the installer doesn’t recognise “aarch64” platform and fallback to x86). WorldOfGoo will run at 1920×1080 fullscreen.

CINEBENCH r15. This one needs Wine, and a 64bits version of it. It’s a benchmark with the CINEMA 4D Engine.

After some some testing, I realized that both openarena and WorldOfGoo mainly use x87 code, at least for the 32bits version of WorldOfGoo. Both QEMU and FEX seem to use use Softfloat for it, to keep the 80bits precision, while box uses hardware float (with some tricks to keep 80bits when needed, like on some data copy used by old games), so I decided to also check the menu page of FTL, that I know use SSE code. But I didn’t test on QEMU (it’s not hardware accelerated anyway, so it would be too slow). FTL will run at default resolution of 1280×720 windowed. I’ll launch the tutorial, answer to the 1st dialog box and mesure the fps at this point.

Machine used

Test machine is an RPI400 (so, 4 big cores @1.8GHz and 4GB of RAM), and I will be using Ubuntu 21.10 64bits OS. Here are the common steps to prepare the OS (after turning off the “Blank screen” option in the “Power” settings), as I’ll be reinstalling the OS between each emu:

sudo apt update && sudo apt upgrade -y

reboot, then install the ssh server

sudo apt install openssh-server -y

And then install some libs, for games

sudo apt install libsdl1.2debian libopenal1 -y
The PI/400 used for this benchmark, running Ubuntu impish.

QEMU-user

Installation is very simple on Ubuntu, as it’s part of the repo:

sudo dpkg --add-architecture i386
sudo dpkg --add-architecture amd64

Then add the correct repo, create /etc/apt/sources.list.d/i386.list with this:

deb [arch=i386] http://security.ubuntu.com/ubuntu/ impish-security  main restricted universe multiverse
deb [arch=i386]  http://archive.ubuntu.com/ubuntu/ impish           main restricted universe multiverse
deb [arch=i386]  http://archive.ubuntu.com/ubuntu/ impish-updates   main restricted universe multiverse
deb [arch=i386]  http://archive.ubuntu.com/ubuntu/ impish-backports main restricted universe multiverse

And /etc/apt/sources.list.d/amd64.list with this

deb [arch=amd64] http://security.ubuntu.com/ubuntu/ impish-security  main restricted universe multiverse
deb [arch=amd64]  http://archive.ubuntu.com/ubuntu/ impish           main restricted universe multiverse
deb [arch=amd64]  http://archive.ubuntu.com/ubuntu/ impish-updates   main restricted universe multiverse
deb [arch=amd64]  http://archive.ubuntu.com/ubuntu/ impish-backports main restricted universe multiverse

Then update with

sudo apt update

Install qemu-usr and binfmt integration with

sudo apt install qemu-user-binfmt

And now some libs with

sudo apt install libgtk2.0-0:i386 libgtk2.0-0:amd64 libsdl2-image-2.0-0:i386 libsdl2-image-2.0-0:amd64 libgl1:i386 libgl1:amd64 libsdl1.2debian:i386 libopenal1:i386 libsdl1.2debian:amd64 libopenal1:amd64 libvorbisfile3:i386 libvorbisfile3:amd64 -y

QEMU doesn’t integrate a pass-thru mecanism for GL by default. So the rendering is done in software. Even worse, it’s done in emulated software, so the graphics performances are really not good with that config.
glmark2 also was corrupted on i386, and anything involving texture didn’t render correctly. Amd64 version was correct.
The x86_64 emulation is slow, but solid (emulating llvmpipe is quite an achievment). With openarena, i couldn’t get the i386 version running, it complained about  missing files at start. No problem with the x86_64 version, but it was very slow, and at 12 seconds per frames, I stopped before the end of the benchmark.
For Wine, I tried a 6.0.1 version (from PlayOnLinux build bot), but it crashed with a Segfault, so I tried the version from the Ubuntu repo, installed with sudo apt install wine64:amd64, wich worked fine (version 5.0.3).

FEX-emu

Installation: I simply followed the instructions on the github README, as Ubuntu 21.10 is a supported OS. But first, curl needed to be installed with

sudo apt install curl

And then, from the readme:

curl --silent https://raw.githubusercontent.com/FEX-Emu/FEX/main/Scripts/InstallFEX.py --output /tmp/InstallFEX.py && python3 /tmp/InstallFEX.py && rm /tmp/InstallFEX.py

I had some binfmt service error message at the beginning of the install, but the installation continued, and, after a few questions, downloaded 930M for the rootfs. The version installed was the 2203. Despite the error messages (in red), everything installed smoothly and worked fine.

I didn’t need the “uname” trick to install WorldOfGoo, using `FEXBash` was enough. Also, the FEX Rootfs has most of the needed libs to run things, so while the initial download is big, there isn’t much to install after that.
There is an actual OpenGL passthru in FEX, so 3D acceleration is working out-of-the-box. Also, FEX handle both 32bits and 64bits on Aarch64, no need for an Armhf multiarch setup.
There was some strange freeze in some of the tests in glmark2 32bits only (jellyfish, terrain, shadow and refract ), and for terrain in 64bits, that affected negatively the bench result. Also, you can feel when some new blocks are being compiled by FEX, it seems to be quite an heavy process (that probably lower some of the bench scores).
For Wine, I tried a 6.0.1 version from PlayOnLinux but it segfaulted. I tried to use the version integrated in the RootFS (running wine64-stable), but it failed to create a Window. I also tried a 5.0 version, but got the same segfault. So I couldn’t test CINEMABENCH unfortunatly.
It should be note also that I got no sound on any games.

Box86/Box64

I will follow the COMPILE.md from the github of each project to install. But I need some build tools first:

sudo apt install git cmake -y

Then, fetch and build box64:

cd ~
git clone https://github.com/ptitSeb/box64
cd box64
mkdir build; cd build; cmake .. -DRPI4ARM64=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo
make -j4
sudo make install

For box86, we need some more build-tools:

sudo apt install gcc-arm-linux-gnueabihf -y

And fetch and build box86 with

cd ~
git clone https://github.com/ptitSeb/box86
cd box86
mkdir build; cd build; cmake .. -DRPI4ARM64=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo
make -j2
sudo make install

Build time is a bit long on the Pi400 (I should provide precompile binary to ease the install process). Restart binfmt integration so box86 & box64 will be called automatically

sudo systemctl restart systemd-binfmt

Now, we need to add armhf architecture and grab some libraries, for box86:

sudo dpkg --add-architecture armhf
sudo apt update
sudo apt install libgtk2.0-0:armhf libsdl2-image-2.0-0:armhf libsdl1.2debian:armhf libopenal1:armhf libvorbisfile3:armhf libgl1:armhf libjpeg62:armhf libcurl4:armhf libasound2-plugins:armhf -y

The setup of WorldOfGoo worked fine with the “uname” trick (although, some error message popped up on the console) and I could install both x86 and x86_64 version easily with box64. But the setup of FTL didn’t work with box64 (I guess some more work on GTK wrapping has to be done there). It did worked with box86 (so without using the uname trick).

To install wine, I used a 64bits version 6.0.1 from PlayOnLinux build bot. But there are many other versions of wine available now. You can also use stagging and tkg versions now with box86+box64. The installation I do lives in the “home”, with just a few shortcut in /usr/local/bin, but it can be done in different ways (and the shortcut are just for conveniances)

cd ~
wget https://www.playonlinux.com/wine/binaries/phoenicis/upstream-linux-amd64/PlayOnLinux-wine-6.0.1-upstream-linux-amd64.tar.gz
mkdir wine
cd wine
tar xf ../PlayOnLinux-wine-6.0.1-upstream-linux-amd64.tar.gz
sudo ln -s $(pwd)/bin/wine /usr/local/bin/wine
sudo ln -s $(pwd)/bin/wine64 /usr/local/bin/wine64
sudo ln -s $(pwd)/bin/wineserver /usr/local/bin/wineserver
sudo ln -s $(pwd)/bin/winecfg /usr/local/bin/winecfg
sudo ln -s $(pwd)/bin/wineboot /usr/local/bin/wineboot

After this (long) setup done, everything was working, sound included.

Benchmarks

Here are the results I collected.

Emus vs Native:

BenchQEMU x86QEMU x86_64FEX x86FEX x86_64Box86Box64ArmhfAarch64
7z691931119714863117308461575787
dav1d 1t10.5711.2421.7527.2849.9449.72170.07185.57
dav1d 4t11.4715.6545.3152.67118.97116.64290.99312.23
glmark224139164179178N/A181
openarenaerror0.0830.91.84.28.45.28.4
The bigger, the better

Here the first 3 benchs are CPU only, and show the effeciency of the translated code. The last two benchs on the other hand are GPU limited, were in those case an emulated speed of 100% of the native speed can be achieved.

For the other benchs, the data are sparcer:

SoftQEMUFEXBox
WorldOfGoo x86 (fps)wont start2.237
WorldOfGoo x86_64 (fps)wont start3737
FTL x86 (fps)N/A3252
FTL x86_64 (fps)N/A4252
CINEBENCH r1510can’t run wine54
The bigger, the better

WorldOfGoo is interesting: we can see, for the 64bits version, that it seems GPU limited, and no matter the emulator used (Box64 or FEX), you get the same speed.

Box64 vs Rosetta2

As a bonus, I did a small quick comparaison of Rosetta2 and Box64. For this, I will use 7zip on macOS and compare with 7zip on Asahi and Box64. The machine used for this is a MacBookPro with a M1 soc. On macOS I used a 7z version downloaded from the 7-zip.org (latest version: 21), and used lipo to split the arm64 and x86_64 version of 7z. On Linux, the version was only the 17 and seem to have less arm64 optimisation, so I also downloaded the 21 from the official website there too. As a consequence, the test results cannot be compared to the other tests (but I did a run on the old 7z version to see how a regular M1 compares to a Pi400)… Aso, keep in mind that there is no Hardware acceleration on Asahi for now, so the refresh of the screen and the terminal window (I wasn’t using ssh here) might use some cycles (probably explaining why the macOS bench of 7z native has a 10% higher score than the linux one). I couldn’t include FEX in the bench as it’s not compatible with the 16k page actualy used on Asahi/M1.

macOS NatifmacOS Rosetta2linux Natiflinux Box64
7z47239337464327124746
The bigger, the better

For reference, the older 16.02 7z binary (the same as with FEX and QEMU comparison) on the mac M1 with box64 gives me: 17942 (yeah, almost 6x faster than the Pi400), a dav1d with 1 thread 354.86 (7x faster) and with 4 threads 636.42 (5.5x faster), all this still with box64. Yeah, the M1 is quite a beast!

So, Rosetta2 is 71% of native speed on 7zip, while Box64 is at 57% of native speed. Taking into account that some optimisations are still to be done in the dynarec, and that none of the advanced Arm64 opcode are used on Box64, it’s not too bad.

14 replies on “Box86/Box64 vs QEMU vs FEX (vs Rosetta2)”

If I understand correctly, only QEMU wasn’t compiled from sources. Care should be taken to ensure the exact same compiler flags are used for each competitor. This evens out any differences in the build environment and would allow for a more accurate comparison.

I agree, but the difference is too big for compile flag to make a difference, especially on JIT/Dynarec, as compilation flags have no influence on emited code

After receiving my new arm board, was thinking if only we had the same rosetta 2 like tool for emulating x86 on any arm device.
And by chance, short after, landed on that page.
Nice work! So it seems you manage to do what Apple did to make x86 apps work on M1?
Are you still loking to improve perfs and compatibility so we could expect (as for Rosetta 2) to run our steam games from an arm board?

Yes, I’m working on box to improve speed and compatibility. Look at the github pages, you’ll see there are both very alive!
Also, steam is already working (with “small mode”), and many games can be run from the steam library.

Yes, RK3588 seems awesome. Juste, the mesa driver is not there yet. That’s why the guide you linked use gl4es, another of my project 😉 to provide OpenGL 2.1 context on top of mali GLES2 blob.
There is a fork of mesa, called “panfork” (by icecream95, that posted on the minecraft guide), that have some early RK3588 support, but not stable yet…

I see you are already aware of the work to get graphics acceleration working for this SOC.. at least on linux.
Have to give a try to panfork and do some testing with it.

Thanks for explaining, the Mali blob is providing GLES driver (so mobile graphics) and your lib GL4ES is making the bridge to OpenGL (so that desktop graphics can be used on mobile GPU) . Clearer for me now 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *