A deep dive into library wrapping

After talking about the dynarec, we’re going to talk about library wrapping.

This article will be very technical from start to finish.

Note: until the “Differences” paragraph, everything said is identical for box86 and box64. As such, until then, you can replace every “box86” with “box64” and “x86” with “x86_64”.

The first step: loading

As discussed in the last article, when the main program starts, it loads itself but also other libraries. These libraries are grouped into two categories: the native libraries and the emulated libraries.

As one can guess, the emulated libraries are those that need to be, well, emulated. In other words, no distinction is made between the main program and those libraries: all the code is executed either by the dynarec or the x86 emulator. As such, they are also loaded the same as the main executable.

On the other hand, native libraries are the strength of box86. How they work is that execution will exit from the emulated world and go to the native, ARM world. As such, they are loaded differently: some memory is allocated for the bridges and the native library is loaded in memory.

In the ELF file format (loaded by box86), there are symbols that are called “relocations”, that are used for the linker to put whatever address it wants there. (Side note: when dynamically requesting symbols, for example through dlsym, the native call is caught (see later) and the same mechanism applies.) When you have a relocation to some code that stays in the x86 realm, then box86 will do as ld.so (the Linux linker). When relocating to a native address however, box86 will make the address point to some pseudo-valid x86 code. This pseudo-valid code is actually an opcode that interrupts the program (by design) 0xCC (INT3), some special signature (the characters ‘S’ and ‘C’, hex 53 43), followed by two pointers used later.

The last step: execution

When the program reaches this special signature (CC 53 43), box86 knows (almost certainly) the following two pointers are actually pointers.

The first pointer is used as the wrapper function and the second is the wrapped function.

How does it all work? Well, each wrapper function is a function that takes in a pointer to the current emu structure and the native (wrapped) function pointer. Then, given that all registers and the stack is stored in the emu structure, the wrapper unpacks the arguments and give them all to the native function. Then, the native function executes (quickly, since it’s native) and returns, giving the hand back to box86. Finally, a ret (or retn, depending on the function) instruction is executed, and the program continues execution as normal.

Now, how does box86 knows which function has which signature? Well, that’s the hardest part. Because this information is not kept when compiling a library! As such, this information must be provided manually.

Providing signatures

These are what the src/wrapped/*_private.h files correspond to. There is a whole mechanism behind, but all is needed to make it work is to declare the function in the library with the correct signature here.

A signature is a string, where the first character is the return type, the second one is ‘F’ (to say, “this is a Function wrapper”), and the following characters are all of the arguments to the function.

An example signature, memcpy

There are different macros used there:

GO: this represent a simple function;
GOW: this represent a weak function. It is still a simple function, however if it is also present elsewhere, don’t use this version. This is used mainly (AFAIK) in libc, where most functions can be overriden by other libraries;
GOS: this represent a function that returns a flat structure that is more than 64 bits or 128 bits (depending on the size of a register) wide. Such a function requires to exit with a retn instead of a ret;
GOM/GOWM: these represent functions (resp. simple and weak) that needs to be wrapped is a more complex manner. As such, box86 doesn’t call the native function but instead some custom function which name starts with a library-wide prefix (usually my_) followed by the function name (which can become quite awkward when said function name starts with one or two underscores…). Such functions include, for example, dlsym, which also requires to check if the library is emulated, so on and so forth, or open and __open (as I said, awkward: they become my_open and my___open) which may attempt to open files in /dev;
DATA/DATAV/DATAM/DATAB: these represent data in the program (for example, environ is a symbol present in libc but it is not a function: DATAM is used to declare it). There are different data types, hence the different variable names.

Once a function is declared, it is added to a library-specific function-type-specific table, which are used by the various symbol resolvers to return the correct function.

An example wrapper declaration (wrapper.h)

An example wrapper definition (box86’s wrapper.c)

Differences between box86 and box64, or why is box64 faster

Now, between box86 and box64, there are differences.

First of all, up until now (August 13^th, 2021) box64 has no function returning a flat structure requiring a GOS, while box86 has plenty. The calling convention is also completely different. There are not the same parameter-passing registers, not the same width…

Difference between box86’s vFi wrapper (top) and box64’s one (bottom)

(As you can see, even on such a simple function, on 32 bits you need to dereference something in memory, while on 64 bits you don’t.)

Actually, the wrapping for box86 is the “least advanced” of the two, in some way. Because box64 also has what’s known in the code as “simple wrapping”: wrappings are considered “simple” when they don’t even need to go through the wrapper function, and instead you simply call the native function, bypassing the wrapper.

This saves some time, because whenever box64 exits the dynarec, all registers (GP registers as well as FPU/MMX/SSE registers) are flushed, potentially wasting some time. And of course, calling and returning from functions is not free, especially when jumping to a variable address.

All in all, this is what enables box64 to run much faster than box86: on OpenArena’s bench, box64 runs at ~90% of native speed while box86 is “only” at ~80%.

Generator

Given all of this, you may notice I didn’t talk about the Python script that generates the stuff in src/wrapped/generated. So, what is in there?

First of all, the file functions_list.txt is a human-readable version of all wrappers and M/S-type functions that have been read by the script on its last run. This can provide whether there is a real difference between the current run and the last run (which may early break the script, potentially speeding up the build process).

An example wrapper type declaration

Next, the files wrapper.h and wrapper.c contain the wrapper functions.

Then, the *types.h files. Those are used by some (but not all, since it is a quite recent addition) wrapped functions libraries (the C files in src/wrapped that contains the my_ definitions). They are some boilerplate function definitions.

When you wrap a function, usually you also need to call the original function as well. And most of the time, you don’t have it statically available, as most wrapped library aren’t loaded statically: for example, if an app uses SDL2, box86/box64 will load it dynamically. As such, when initializing a native library, it requests every “doubly”-wrapped functions from the native library and stores it with the correct signature for convenience down the line.

Those files, then, gives the every function that is needed along with its signature, so that the code.

On box64, a new addition is the *defs.h and *undefs.h. Those are actually something I need to backport to box86.

They seem not very useful (they are almost all empty), but actually, if you open box86’s wrappedsdl2_private.h (to pick one at random), you will start seeing some weird comments like //%{pFJ}, and next to that a function which signature is pFEV. But actually, the function takes an SDL_JoystickGUID as a parameter, which was declared (sooner in the file) as type J!

What this line means is that the function ABI is the same as if a V was given (so the wrapper used for this function is pFEV), but the real signature of the native function is pFJ. It is thus declared as so in the <lib>types.h file.

Such a way of defining custom types is very confusing. And as such: it is fixed in box64! The same function’s signature would simply be pFEJ, and when declaring type J you would also indicate that it should be equivalent to (aka, replaced by when looking for the wrapper to use) a V.

However, this would introduce some issue: what if two libraries were to declare the same letter for two different structures? How could the code know what this type should be replaced with? This is where the files ending in defs.h come in: they are simply defining the function type to be what the wrapper should really be (eg, defines pFEJ as pFEV), and undefining them to avoid macro pollution.

Currently, this is only used once in SDL2 in box64, but it should be useful and prevent errors from occurring in both box64 and box86.

For example, on x86, every time a “flat structure” is returned, a “phantom” pointer parameter is given. Of course, the script checks if the first parameter in the signature really is a pointer, but what if the first parameter of the function is also a pointer? In this case, the script has no way of detecting if an error was made, and will happily go on on its merry way. And an error has been made undetectable.

However, with the new method, there is no possibility of such an error occurring: the script detects the use of a “flat structure” as a return type, and correctly changes the type.

3 replies on “A deep dive into library wrapping”

[…] системные библиотеки, включая libc, libm, GTK, SDL, Vulkan и OpenGL, подменяются на варианты, родные для целевых платформ. Таким […]

[…] a previous post, we talked about the mechanism behind library wrapping. In it, I said box86/box64 used some […]