How to bench Linux userspace emulator
Test will consist of the bench I already used a couple of time, and that can be run as native or emulated:
- 7-zip integrated benchmark, that contains mostly integer code (no x87 or SSE), and can be used as a baseline to see the pure x86 code translation efficiency. The version 16 present in Ubuntu was used for those tests.
- dav1d, an opensource video transcoding tool, that includes hand-optimized SSE assembly code (SSE3 or more).
- glmark2 that is GL limited and should run at mostly native speed (as long as GL is hardware accelerated). I couldn’t install the armhf version of glmark2 on Ubuntu, so only the native 64bits version was benchmarked.
- openarena, that contains x87 code, and a JIT, and, in that config, is very much GPU limited, and so should be running very close to the native speed (again, as long as GL is still hardware accelerated).
I’ll also do some quick bench not available natively. The fps will simply be measured with `HUD_GALLIUM=fps` on a stable and reproducible moment in the game:
- WorldOfGoo: The game has simple graphics, it should run fine. Measures will be done on the Title screen.
- FTL: that I added to the bench after doing the QEMU measures. Measures will be done on the 1st Tutorial screen, while the game is paused.
- CINEBENCH r15: A benchmark based on a raytracing engine. Lots of SSE (SSE2 and more) code here. Use multi-core also. Does include a CPU bench and OpenGL bench, but only the CPU one is used here. It provides a simple number indicating the performance (the higher, the better).
To install WorldOfGoo, the “uname” trick will be used, as this allows to choose x86 or x86_64 installation (without the trick, the installer doesn’t recognise “aarch64” platform and fallback to x86). WorldOfGoo will run at 1920×1080 fullscreen.
CINEBENCH r15. This one needs Wine, and a 64bits version of it. It’s a benchmark with the CINEMA 4D Engine.
After some some testing, I realized that both openarena and WorldOfGoo mainly use x87 code, at least for the 32bits version of WorldOfGoo. Both QEMU and FEX seem to use use Softfloat for it, to keep the 80bits precision, while box uses hardware float (with some tricks to keep 80bits when needed, like on some data copy used by old games), so I decided to also check the menu page of FTL, that I know use SSE code. But I didn’t test on QEMU (it’s not hardware accelerated anyway, so it would be too slow). FTL will run at default resolution of 1280×720 windowed. I’ll launch the tutorial, answer to the 1st dialog box and mesure the fps at this point.
Test machine is an RPI400 (so, 4 big cores @1.8GHz and 4GB of RAM), and I will be using Ubuntu 21.10 64bits OS. Here are the common steps to prepare the OS (after turning off the “Blank screen” option in the “Power” settings), as I’ll be reinstalling the OS between each emu:
sudo apt update && sudo apt upgrade -y
reboot, then install the ssh server
sudo apt install openssh-server -y
And then install some libs, for games
sudo apt install libsdl1.2debian libopenal1 -y
Installation is very simple on Ubuntu, as it’s part of the repo:
sudo dpkg --add-architecture i386 sudo dpkg --add-architecture amd64
Then add the correct repo, create
/etc/apt/sources.list.d/i386.list with this:
deb [arch=i386] http://security.ubuntu.com/ubuntu/ impish-security main restricted universe multiverse deb [arch=i386] http://archive.ubuntu.com/ubuntu/ impish main restricted universe multiverse deb [arch=i386] http://archive.ubuntu.com/ubuntu/ impish-updates main restricted universe multiverse deb [arch=i386] http://archive.ubuntu.com/ubuntu/ impish-backports main restricted universe multiverse
And /etc/apt/sources.list.d/amd64.list with this
deb [arch=amd64] http://security.ubuntu.com/ubuntu/ impish-security main restricted universe multiverse deb [arch=amd64] http://archive.ubuntu.com/ubuntu/ impish main restricted universe multiverse deb [arch=amd64] http://archive.ubuntu.com/ubuntu/ impish-updates main restricted universe multiverse deb [arch=amd64] http://archive.ubuntu.com/ubuntu/ impish-backports main restricted universe multiverse
Then update with
sudo apt update
Install qemu-usr and binfmt integration with
sudo apt install qemu-user-binfmt
And now some libs with
sudo apt install libgtk2.0-0:i386 libgtk2.0-0:amd64 libsdl2-image-2.0-0:i386 libsdl2-image-2.0-0:amd64 libgl1:i386 libgl1:amd64 libsdl1.2debian:i386 libopenal1:i386 libsdl1.2debian:amd64 libopenal1:amd64 libvorbisfile3:i386 libvorbisfile3:amd64 -y
QEMU doesn’t integrate a pass-thru mecanism for GL by default. So the rendering is done in software. Even worse, it’s done in emulated software, so the graphics performances are really not good with that config.
glmark2 also was corrupted on i386, and anything involving texture didn’t render correctly. Amd64 version was correct.
The x86_64 emulation is slow, but solid (emulating llvmpipe is quite an achievment). With openarena, i couldn’t get the i386 version running, it complained about missing files at start. No problem with the x86_64 version, but it was very slow, and at 12 seconds per frames, I stopped before the end of the benchmark.
For Wine, I tried a 6.0.1 version (from PlayOnLinux build bot), but it crashed with a Segfault, so I tried the version from the Ubuntu repo, installed with sudo apt install wine64:amd64, wich worked fine (version 5.0.3).
Installation: I simply followed the instructions on the github README, as Ubuntu 21.10 is a supported OS. But first,
curl needed to be installed with
sudo apt install curl
And then, from the readme:
curl --silent https://raw.githubusercontent.com/FEX-Emu/FEX/main/Scripts/InstallFEX.py --output /tmp/InstallFEX.py && python3 /tmp/InstallFEX.py && rm /tmp/InstallFEX.py
I had some binfmt service error message at the beginning of the install, but the installation continued, and, after a few questions, downloaded 930M for the rootfs. The version installed was the 2203. Despite the error messages (in red), everything installed smoothly and worked fine.
I didn’t need the “uname” trick to install WorldOfGoo, using `FEXBash` was enough. Also, the FEX Rootfs has most of the needed libs to run things, so while the initial download is big, there isn’t much to install after that.
There is an actual OpenGL passthru in FEX, so 3D acceleration is working out-of-the-box. Also, FEX handle both 32bits and 64bits on Aarch64, no need for an Armhf multiarch setup.
There was some strange freeze in some of the tests in glmark2 32bits only (jellyfish, terrain, shadow and refract ), and for terrain in 64bits, that affected negatively the bench result. Also, you can feel when some new blocks are being compiled by FEX, it seems to be quite an heavy process (that probably lower some of the bench scores).
For Wine, I tried a 6.0.1 version from PlayOnLinux but it segfaulted. I tried to use the version integrated in the RootFS (running wine64-stable), but it failed to create a Window. I also tried a 5.0 version, but got the same segfault. So I couldn’t test CINEMABENCH unfortunatly.
It should be note also that I got no sound on any games.
I will follow the COMPILE.md from the github of each project to install. But I need some build tools first:
sudo apt install git cmake -y
Then, fetch and build box64:
cd ~ git clone https://github.com/ptitSeb/box64 cd box64 mkdir build; cd build; cmake .. -DRPI4ARM64=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo make -j4 sudo make install
For box86, we need some more build-tools:
sudo apt install gcc-arm-linux-gnueabihf -y
And fetch and build box86 with
cd ~ git clone https://github.com/ptitSeb/box86 cd box86 mkdir build; cd build; cmake .. -DRPI4ARM64=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo make -j2 sudo make install
Build time is a bit long on the Pi400 (I should provide precompile binary to ease the install process). Restart binfmt integration so box86 & box64 will be called automatically
sudo systemctl restart systemd-binfmt
Now, we need to add armhf architecture and grab some libraries, for box86:
sudo dpkg --add-architecture armhf sudo apt update sudo apt install libgtk2.0-0:armhf libsdl2-image-2.0-0:armhf libsdl1.2debian:armhf libopenal1:armhf libvorbisfile3:armhf libgl1:armhf libjpeg62:armhf libcurl4:armhf libasound2-plugins:armhf -y
The setup of WorldOfGoo worked fine with the “uname” trick (although, some error message popped up on the console) and I could install both x86 and x86_64 version easily with box64. But the setup of FTL didn’t work with box64 (I guess some more work on GTK wrapping has to be done there). It did worked with box86 (so without using the uname trick).
To install wine, I used a 64bits version 6.0.1 from PlayOnLinux build bot. But there are many other versions of wine available now. You can also use stagging and tkg versions now with box86+box64. The installation I do lives in the “home”, with just a few shortcut in /usr/local/bin, but it can be done in different ways (and the shortcut are just for conveniances)
cd ~ wget https://www.playonlinux.com/wine/binaries/phoenicis/upstream-linux-amd64/PlayOnLinux-wine-6.0.1-upstream-linux-amd64.tar.gz mkdir wine cd wine tar xf ../PlayOnLinux-wine-6.0.1-upstream-linux-amd64.tar.gz sudo ln -s $(pwd)/bin/wine /usr/local/bin/wine sudo ln -s $(pwd)/bin/wine64 /usr/local/bin/wine64 sudo ln -s $(pwd)/bin/wineserver /usr/local/bin/wineserver sudo ln -s $(pwd)/bin/winecfg /usr/local/bin/winecfg sudo ln -s $(pwd)/bin/wineboot /usr/local/bin/wineboot
After this (long) setup done, everything was working, sound included.
Here are the results I collected.
Emus vs Native:
|Bench||QEMU x86||QEMU x86_64||FEX x86||FEX x86_64||Box86||Box64||Armhf||Aarch64|
Here the first 3 benchs are CPU only, and show the effeciency of the translated code. The last two benchs on the other hand are GPU limited, were in those case an emulated speed of 100% of the native speed can be achieved.
For the other benchs, the data are sparcer:
|WorldOfGoo x86 (fps)||wont start||2.2||37|
|WorldOfGoo x86_64 (fps)||wont start||37||37|
|FTL x86 (fps)||N/A||32||52|
|FTL x86_64 (fps)||N/A||42||52|
|CINEBENCH r15||10||can’t run wine||54|
WorldOfGoo is interesting: we can see, for the 64bits version, that it seems GPU limited, and no matter the emulator used (Box64 or FEX), you get the same speed.
Box64 vs Rosetta2
As a bonus, I did a small quick comparaison of Rosetta2 and Box64. For this, I will use 7zip on macOS and compare with 7zip on Asahi and Box64. The machine used for this is a MacBookPro with a M1 soc. On macOS I used a 7z version downloaded from the 7-zip.org (latest version: 21), and used lipo to split the arm64 and x86_64 version of 7z. On Linux, the version was only the 17 and seem to have less arm64 optimisation, so I also downloaded the 21 from the official website there too. As a consequence, the test results cannot be compared to the other tests (but I did a run on the old 7z version to see how a regular M1 compares to a Pi400)… Aso, keep in mind that there is no Hardware acceleration on Asahi for now, so the refresh of the screen and the terminal window (I wasn’t using ssh here) might use some cycles (probably explaining why the macOS bench of 7z native has a 10% higher score than the linux one). I couldn’t include FEX in the bench as it’s not compatible with the 16k page actualy used on Asahi/M1.
|macOS Natif||macOS Rosetta2||linux Natif||linux Box64|
For reference, the older 16.02 7z binary (the same as with FEX and QEMU comparison) on the mac M1 with box64 gives me: 17942 (yeah, almost 6x faster than the Pi400), a dav1d with 1 thread 354.86 (7x faster) and with 4 threads 636.42 (5.5x faster), all this still with box64. Yeah, the M1 is quite a beast!
So, Rosetta2 is 71% of native speed on 7zip, while Box64 is at 57% of native speed. Taking into account that some optimisations are still to be done in the dynarec, and that none of the advanced Arm64 opcode are used on Box64, it’s not too bad.