Categories
Benchmark Box64 Box86

New version of box64 v0.3.2 and box86 v0.3.8

A new version of both box64 and box86 has been released!

While the changelog for box86 doesn’t contain much, there are a lot of new improvements, features and fixes for box64. With some speed increase, improved compatibility and a new subproject, box32, there is a lot. But let’s dive into more details on the performances improvements and that new box32 (sub)project.

Performance improvements

There have been multiple improvements on performance for this development cycle. Along with the traditional small improvements on code generation here and there, and some more opcodes support on RV64 and LA64, there was added support for “native flags” and some much needed rework on the strong memory model emulation.

Native flags on ARM64

Native flags support means that now, on ARM64 machine, box64 will try its best to use the ARM flags to match x86 flags. While this is not always possible, there are still many cases when a 1:1 matching can be done. This can bring some big improvements on CPU intensive task. For example, the 7zip benchmark gain up to 30% of speed on ARM64 with native flags support! The test performed more or less the same for many years, until native flags was added to box64 (and especially the handling of the Carry flag).

Native flags on RISC-V

But we also added native flags to RV64! How, you would say, doesn’t RV64 not have any flags? True. But there is a trick. In many cases, on x86 code, there is a comparison immediately followed by a conditional jump. And in many cases, the flags generated by the comparison (or math operation) are only used for the conditional jump. In that case, box64 will not update the flags, but will instead fuse the comparison and conditional jump to avoid useless internal flags update. The speedup here is also substantial for 7zip benchmark:

Loongarch already had native flags handling in previous version, so nothing new on that front.

Strong memory emulation

The strong memory model also got some much needed refactor, to make it cleaner (code-wise) and to allow more tweaking. The emulation works by inserting memory barrier when the code writes to the memory, with a strategy to try to put as few barriers as possible to avoid impacting performances too much. A new option has been introduced to use weaker barriers (i.e. more speed). The refactor also fixed issues where barriers would not be placed in the optimal way. Those lead to improved performances in games that need strong memory barriers (like most Windows Unity3D games for example).

Box64 vs Rosetta

With all those speed improvements, it was time to see how box64 now compared with Rosetta on the Apple Silicon Macs. So using an M1 MacBook Pro running macOS (so using Rosetta to emulate x86_64) and Fedora (using box64 to emulate x86_64), the 7zip benchmark, with the 23.01 version, will be used again.

And yeah, while the native version of 7zip seems faster on macOS than on Fedora, Box64 is now faster than Rosetta on the 7zip benchmark!

I also tested with another software: dav1d, which is an AV1 codec. This one uses all sort of CPU extensions, with hand optimized assembly. It also uses AVX if available.

This time, box64 is a bit slower.

Box32

While box86 allows to run 32bits programs on 32bits systems and box64 is for 64bits programs on 64bits systems, there is a new need emerging: running 32bits programs on 64bits systems. This bitness change is unnatural, but is forced by the current industry, that just force a 64bits everywhere tendency. Note that only the x86 world is keeping 32bits alive, and even newer state of the art 64bits processors in that architecture keeps some 32bits compatibility, while all RISC-based arch move away from 32bits. ARM removed all 32bits capability in Armv9, RISC-V 32bits processors are separated from 64bits ones, and Loongarch is just LA64. So box32 is there to allow 32bits programs to run on 64bits OS. This is actually a subpart of box64, for ease of maintenance and because there are many parts in common. But box32 has many differences when it comes to library wrapping, because of the bitness change.

Box32 is disabled by default on box64 for now, but it’s a build option that can be easily switched on.

It’s still a young project, so not much things work (like steam and even steamcmd for example). But some Linux games work, and when they work, they are fast. Some Windows games can also be played, using a “regular” 64bits version of Wine. The upside of this solution, compared to the new WoW64 build of Wine (the new WoW64 is the same as box32: change bitness on function call, but inside Wine instead) is that OpenGL will be full speed with box32, while wine might use intermediary buffer copies (to keep buffers in the 32bits address space) that can induce slowdowns. Those buffer copies are not present on Vulkan, thanks to some extension (and a box64 hack can help if the extension is not present), so for dxvk, the new WoW64 is still the preferred solution. Also, box32 doesn’t support Vulkan yet, so it’s limited to WineD3D for now.

Linux version of Limbo running on Loognarch with Box32

Stability improvements

Along some small fixes on the dynarec code generator, there have been a few x87 emulation improvements, that allow some old 32bits windows games to now work properly (both box32 and box64 with the new Wine WoW64 benefit from those fixes). The strong memory model refactor also fixed a few issues, and there is now a new subproject in box: the wrapperhelper. There was a first attempt at this before, but that new version is a complete new project, not based on llvm (implementing its own C parser), and that allows generating library wrapping faster and more accurately than doing it manually. A few bugs have been fixed thanks to this. If you want to contribute wrapping some new libraries on box64, the article on how to create a wrapping has been updated with a chapter about the wrapperhelper. Note that this article is only about box64 wrapping, and doesn’t convert box32 wrapping, that is slightly different and more complex (and subject to change).

Conclusion

Head up to GitHub to grab the new sources for box86 and box64 and enjoy even faster execution than before! If you are an 64bits only platform, give box32 a try, you might get some things to run fine, especially old Linux games, like Undertale, Limbo or Anomaly Warzone Earth (but be aware of the limitations: no steamcmd, no steam, no vulkan, and some things in Wine still don’t work correctly, like registry access it seems).

2 replies on “New version of box64 v0.3.2 and box86 v0.3.8”

Leave a Reply

Your email address will not be published. Required fields are marked *