Hash :
9ba55921
Author :
Date :
2021-03-05T10:07:30
Static tramp v5 (#624)
* Static Trampolines
Closure Trampoline Security Issue
=================================
Currently, the trampoline code used in libffi is not statically defined in
a source file (except for MACH). The trampoline is either pre-defined
machine code in a data buffer. Or, it is generated at runtime. In order to
execute a trampoline, it needs to be placed in a page with executable
permissions.
Executable data pages are attack surfaces for attackers who may try to
inject their own code into the page and contrive to have it executed. The
security settings in a system may prevent various tricks used in user land
to write code into a page and to have it executed somehow. On such systems,
libffi trampolines would not be able to run.
Static Trampoline
=================
To solve this problem, the trampoline code needs to be defined statically
in a source file, compiled and placed in the text segment so it can be
mapped and executed naturally without any tricks. However, the trampoline
needs to be able to access the closure pointer at runtime.
PC-relative data referencing
============================
The solution implemented in this patch set uses PC-relative data references.
The trampoline is mapped in a code page. Adjacent to the code page, a data
page is mapped that contains the parameters of the trampoline:
- the closure pointer
- pointer to the ABI handler to jump to
The trampoline code uses an offset relative to its current PC to access its
data.
Some architectures support PC-relative data references in the ISA itself.
E.g., X64 supports RIP-relative references. For others, the PC has to
somehow be loaded into a general purpose register to do PC-relative data
referencing. To do this, we need to define a get_pc() kind of function and
call it to load the PC in a desired register.
There are two cases:
1. The call instruction pushes the return address on the stack.
In this case, get_pc() will extract the return address from the stack
and load it in the desired register and return.
2. The call instruction stores the return address in a designated register.
In this case, get_pc() will copy the return address to the desired
register and return.
Either way, the PC next to the call instruction is obtained.
Scratch register
================
In order to do its job, the trampoline code would need to use a scratch
register. Depending on the ABI, there may not be a register available for
scratch. This problem needs to be solved so that all ABIs will work.
The trampoline will save two values on the stack:
- the closure pointer
- the original value of the scratch register
This is what the stack will look like:
sp before trampoline ------> --------------------
| closure pointer |
--------------------
| scratch register |
sp after trampoline -------> --------------------
The ABI handler can do the following as needed by the ABI:
- the closure pointer can be loaded in a desired register
- the scratch register can be restored to its original value
- the stack pointer can be restored to its original value
(the value when the trampoline was invoked)
To do this, I have defined prolog code for each ABI handler. The legacy
trampoline jumps to the ABI handler directly. But the static trampoline
defined in this patch jumps tp the prolog code which performs the above
actions before jumping to the ABI handler.
Trampoline Table
================
In order to reduce the trampoline memory footprint, the trampoline code
would be defined as a code array in the text segment. This array would be
mapped into the address space of the caller. The mapping would, therefore,
contain a trampoline table.
Adjacent to the trampoline table mapping, there will be a data mapping that
contains a parameter table, one parameter block for each trampoline. The
parameter block will contain:
- a pointer to the closure
- a pointer to the ABI handler
The static trampoline code would finally look like this:
- Make space on the stack for the closure and the scratch register
by moving the stack pointer down
- Store the original value of the scratch register on the stack
- Using PC-relative reference, get the closure pointer
- Store the closure pointer on the stack
- Using PC-relative reference, get the ABI handler pointer
- Jump to the ABI handler
Mapping size
============
The size of the code mapping that contains the trampoline table needs to be
determined on a per architecture basis. If a particular architecture
supports multiple base page sizes, then the largest supported base page size
needs to be chosen. E.g., we choose 16K for ARM64.
Trampoline allocation and free
==============================
Static trampolines are allocated in ffi_closure_alloc() and freed in
ffi_closure_free().
Normally, applications use these functions. But there are some cases out
there where the user of libffi allocates and manages its own closure
memory. In such cases, static trampolines cannot be used. These will
fall back to using legacy trampolines. The user has to make sure that
the memory is executable.
ffi_closure structure
=====================
I did not want to make any changes to the size of the closure structure for
this feature to guarantee compatibility. But the opaque static trampoline
handle needs to be stored in the closure. I have defined it as follows:
- char tramp[FFI_TRAMPOLINE_SIZE];
+ union {
+ char tramp[FFI_TRAMPOLINE_SIZE];
+ void *ftramp;
+ };
If static trampolines are used, then tramp[] is not needed to store a
dynamic trampoline. That space can be reused to store the handle. Hence,
the union.
Architecture Support
====================
Support has been added for x64, i386, aarch64 and arm. Support for other
architectures can be added very easily in the future.
OS Support
==========
Support has been added for Linux. Support for other OSes can be added very
easily.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
* x86: Support for Static Trampolines
- Define the arch-specific initialization function ffi_tramp_arch ()
that returns trampoline size information to common code.
- Define the trampoline code mapping and data mapping sizes.
- Define the trampoline code table statically. Define two tables,
actually, one with CET and one without.
- Introduce a tiny prolog for each ABI handling function. The ABI
handlers addressed are:
- ffi_closure_unix64
- ffi_closure_unix64_sse
- ffi_closure_win64
The prolog functions are called:
- ffi_closure_unix64_alt
- ffi_closure_unix64_sse_alt
- ffi_closure_win64_alt
The legacy trampoline jumps to the ABI handler. The static
trampoline jumps to the prolog function. The prolog function uses
the information provided by the static trampoline, sets things up
for the ABI handler and then jumps to the ABI handler.
- Call ffi_tramp_set_parms () in ffi_prep_closure_loc () to
initialize static trampoline parameters.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
* i386: Support for Static Trampolines
- Define the arch-specific initialization function ffi_tramp_arch ()
that returns trampoline size information to common code.
- Define the trampoline code table statically. Define two tables,
actually, one with CET and one without.
- Define the trampoline code table statically.
- Introduce a tiny prolog for each ABI handling function. The ABI
handlers addressed are:
- ffi_closure_i386
- ffi_closure_STDCALL
- ffi_closure_REGISTER
The prolog functions are called:
- ffi_closure_i386_alt
- ffi_closure_STDCALL_alt
- ffi_closure_REGISTER_alt
The legacy trampoline jumps to the ABI handler. The static
trampoline jumps to the prolog function. The prolog function uses
the information provided by the static trampoline, sets things up
for the ABI handler and then jumps to the ABI handler.
- Call ffi_tramp_set_parms () in ffi_prep_closure_loc () to
initialize static trampoline parameters.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
* arm64: Support for Static Trampolines
- Define the arch-specific initialization function ffi_tramp_arch ()
that returns trampoline size information to common code.
- Define the trampoline code mapping and data mapping sizes.
- Define the trampoline code table statically.
- Introduce a tiny prolog for each ABI handling function. The ABI
handlers addressed are:
- ffi_closure_SYSV
- ffi_closure_SYSV_V
The prolog functions are called:
- ffi_closure_SYSV_alt
- ffi_closure_SYSV_V_alt
The legacy trampoline jumps to the ABI handler. The static
trampoline jumps to the prolog function. The prolog function uses
the information provided by the static trampoline, sets things up
for the ABI handler and then jumps to the ABI handler.
- Call ffi_tramp_set_parms () in ffi_prep_closure_loc () to
initialize static trampoline parameters.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
* arm: Support for Static Trampolines
- Define the arch-specific initialization function ffi_tramp_arch ()
that returns trampoline size information to common code.
- Define the trampoline code mapping and data mapping sizes.
- Define the trampoline code table statically.
- Introduce a tiny prolog for each ABI handling function. The ABI
handlers addressed are:
- ffi_closure_SYSV
- ffi_closure_VFP
The prolog functions are called:
- ffi_closure_SYSV_alt
- ffi_closure_VFP_alt
The legacy trampoline jumps to the ABI handler. The static
trampoline jumps to the prolog function. The prolog function uses
the information provided by the static trampoline, sets things up
for the ABI handler and then jumps to the ABI handler.
- Call ffi_tramp_set_parms () in ffi_prep_closure_loc () to
initialize static trampoline parameters.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
libffi-3.4 was released on TBD. Check the libffi web page for updates: URL:http://sourceware.org/libffi/.
Compilers for high level languages generate code that follow certain conventions. These conventions are necessary, in part, for separate compilation to work. One such convention is the “calling convention”. The “calling convention” is essentially a set of assumptions made by the compiler about where function arguments will be found on entry to a function. A “calling convention” also specifies where the return value for a function is found.
Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. Libffi can be used in such programs to provide a bridge from the interpreter program to compiled code.
The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run time.
FFI stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language. The libffi library really only provides the lowest, machine dependent layer of a fully featured foreign function interface. A layer must exist above libffi that handles type conversions for values passed between the two languages.
Libffi has been ported to many different platforms.
At the time of release, the following basic configurations have been tested:
| Architecture | Operating System | Compiler |
|---|---|---|
| AArch64 (ARM64) | iOS | Clang |
| AArch64 | Linux | GCC |
| AArch64 | Windows | MSVC |
| Alpha | Linux | GCC |
| Alpha | Tru64 | GCC |
| ARC | Linux | GCC |
| ARM | Linux | GCC |
| ARM | iOS | GCC |
| ARM | Windows | MSVC |
| AVR32 | Linux | GCC |
| Blackfin | uClinux | GCC |
| CSKY | Linux | GCC |
| HPPA | HPUX | GCC |
| KVX | Linux | GCC |
| IA-64 | Linux | GCC |
| M68K | FreeMiNT | GCC |
| M68K | Linux | GCC |
| M68K | RTEMS | GCC |
| M88K | OpenBSD/mvme88k | GCC |
| Meta | Linux | GCC |
| MicroBlaze | Linux | GCC |
| MIPS | IRIX | GCC |
| MIPS | Linux | GCC |
| MIPS | RTEMS | GCC |
| MIPS64 | Linux | GCC |
| Moxie | Bare metal | GCC |
| Nios II | Linux | GCC |
| OpenRISC | Linux | GCC |
| PowerPC 32-bit | AIX | IBM XL C |
| PowerPC 64-bit | AIX | IBM XL C |
| PowerPC | AMIGA | GCC |
| PowerPC | Linux | GCC |
| PowerPC | Mac OSX | GCC |
| PowerPC | FreeBSD | GCC |
| PowerPC 64-bit | FreeBSD | GCC |
| PowerPC 64-bit | Linux ELFv1 | GCC |
| PowerPC 64-bit | Linux ELFv2 | GCC |
| RISC-V 32-bit | Linux | GCC |
| RISC-V 64-bit | Linux | GCC |
| S390 | Linux | GCC |
| S390X | Linux | GCC |
| SPARC | Linux | GCC |
| SPARC | Solaris | GCC |
| SPARC | Solaris | Oracle Solaris Studio C |
| SPARC64 | Linux | GCC |
| SPARC64 | FreeBSD | GCC |
| SPARC64 | Solaris | Oracle Solaris Studio C |
| TILE-Gx/TILEPro | Linux | GCC |
| VAX | OpenBSD/vax | GCC |
| X86 | FreeBSD | GCC |
| X86 | GNU HURD | GCC |
| X86 | Interix | GCC |
| X86 | kFreeBSD | GCC |
| X86 | Linux | GCC |
| X86 | OpenBSD | GCC |
| X86 | OS/2 | GCC |
| X86 | Solaris | GCC |
| X86 | Solaris | Oracle Solaris Studio C |
| X86 | Windows/Cygwin | GCC |
| X86 | Windows/MingW | GCC |
| X86-64 | FreeBSD | GCC |
| X86-64 | Linux | GCC |
| X86-64 | Linux/x32 | GCC |
| X86-64 | OpenBSD | GCC |
| X86-64 | Solaris | Oracle Solaris Studio C |
| X86-64 | Windows/Cygwin | GCC |
| X86-64 | Windows/MingW | GCC |
| X86-64 | Mac OSX | GCC |
| Xtensa | Linux | GCC |
Please send additional platform test results to libffi-discuss@sourceware.org.
First you must configure the distribution for your particular system. Go to the directory you wish to build libffi in and run the “configure” program found in the root directory of the libffi source distribution. Note that building libffi requires a C99 compatible compiler.
If you’re building libffi directly from git hosted sources, configure won’t exist yet; run ./autogen.sh first. This will require that you install autoconf, automake and libtool.
You may want to tell configure where to install the libffi library and
header files. To do that, use the --prefix configure switch. Libffi
will install under /usr/local by default.
If you want to enable extra run-time debugging checks use the the
--enable-debug configure switch. This is useful when your program dies
mysteriously while using libffi.
Another useful configure switch is --enable-purify-safety. Using this
will add some extra code which will suppress certain warnings when you
are using Purify with libffi. Only use this switch when using
Purify, as it will slow down the library.
If you don’t want to build documentation, use the --disable-docs
configure switch.
It’s also possible to build libffi on Windows platforms with Microsoft’s Visual C++ compiler. In this case, use the msvcc.sh wrapper script during configuration like so:
path/to/configure CC=path/to/msvcc.sh CXX=path/to/msvcc.sh LD=link CPP="cl -nologo -EP" CPPFLAGS="-DFFI_BUILDING_DLL"
For 64-bit Windows builds, use CC="path/to/msvcc.sh -m64" and
CXX="path/to/msvcc.sh -m64". You may also need to specify
--build appropriately.
It is also possible to build libffi on Windows platforms with the LLVM project’s clang-cl compiler, like below:
path/to/configure CC="path/to/msvcc.sh -clang-cl" CXX="path/to/msvcc.sh -clang-cl" LD=link CPP="clang-cl -EP"
When building with MSVC under a MingW environment, you may need to remove the line in configure that sets ‘fix_srcfile_path’ to a ‘cygpath’ command. (‘cygpath’ is not present in MingW, and is not required when using MingW-style paths.)
To build static library for ARM64 with MSVC using visual studio solution, msvc_build folder have aarch64/Ffi_staticLib.sln required header files in aarch64/aarch64_include/
SPARC Solaris builds require the use of the GNU assembler and linker.
Point AS and LD environment variables at those tool prior to
configuration.
For iOS builds, the libffi.xcodeproj Xcode project is available.
Configure has many other options. Use configure --help to see them all.
Once configure has finished, type “make”. Note that you must be using GNU make. You can ftp GNU make from ftp.gnu.org:/pub/gnu/make .
To ensure that libffi is working as advertised, type “make check”. This will require that you have DejaGNU installed.
To install the library and header files, type make install.
See the git log for details at http://github.com/libffi/libffi.
3.4 TBD
Add support for Alibaba's CSKY architecture.
Add support for Intel Control-flow Enforcement Technology (CET).
Add support for ARM Pointer Authentication (PA).
Fix 32-bit PPC regression.
Fix MIPS soft-float problem.
3.3 Nov-23-19
Add RISC-V support.
New API in support of GO closures.
Add IEEE754 binary128 long double support for 64-bit Power
Default to Microsoft's 64 bit long double ABI with Visual C++.
GNU compiler uses 80 bits (128 in memory) FFI_GNUW64 ABI.
Add Windows on ARM64 (WOA) support.
Add Windows 32-bit ARM support.
Raw java (gcj) API deprecated.
Add pre-built PDF documentation to source distribution.
Many new test cases and bug fixes.
3.2.1 Nov-12-14
Build fix for non-iOS AArch64 targets.
3.2 Nov-11-14
Add C99 Complex Type support (currently only supported on
s390).
Add support for PASCAL and REGISTER calling conventions on x86
Windows/Linux.
Add OpenRISC and Cygwin-64 support.
Bug fixes.
3.1 May-19-14
Add AArch64 (ARM64) iOS support.
Add Nios II support.
Add m88k and DEC VAX support.
Add support for stdcall, thiscall, and fastcall on non-Windows
32-bit x86 targets such as Linux.
Various Android, MIPS N32, x86, FreeBSD and UltraSPARC IIi
fixes.
Make the testsuite more robust: eliminate several spurious
failures, and respect the $CC and $CXX environment variables.
Archive off the manually maintained ChangeLog in favor of git
log.
3.0.13 Mar-17-13
Add Meta support.
Add missing Moxie bits.
Fix stack alignment bug on 32-bit x86.
Build fix for m68000 targets.
Build fix for soft-float Power targets.
Fix the install dir location for some platforms when building
with GCC (OS X, Solaris).
Fix Cygwin regression.
3.0.12 Feb-11-13
Add Moxie support.
Add AArch64 support.
Add Blackfin support.
Add TILE-Gx/TILEPro support.
Add MicroBlaze support.
Add Xtensa support.
Add support for PaX enabled kernels with MPROTECT.
Add support for native vendor compilers on
Solaris and AIX.
Work around LLVM/GCC interoperability issue on x86_64.
3.0.11 Apr-11-12
Lots of build fixes.
Add support for variadic functions (ffi_prep_cif_var).
Add Linux/x32 support.
Add thiscall, fastcall and MSVC cdecl support on Windows.
Add Amiga and newer MacOS support.
Add m68k FreeMiNT support.
Integration with iOS' xcode build tools.
Fix Octeon and MC68881 support.
Fix code pessimizations.
3.0.10 Aug-23-11
Add support for Apple's iOS.
Add support for ARM VFP ABI.
Add RTEMS support for MIPS and M68K.
Fix instruction cache clearing problems on
ARM and SPARC.
Fix the N64 build on mips-sgi-irix6.5.
Enable builds with Microsoft's compiler.
Enable x86 builds with Oracle's Solaris compiler.
Fix support for calling code compiled with Oracle's Sparc
Solaris compiler.
Testsuite fixes for Tru64 Unix.
Additional platform support.
3.0.9 Dec-31-09
Add AVR32 and win64 ports. Add ARM softfp support.
Many fixes for AIX, Solaris, HP-UX, *BSD.
Several PowerPC and x86-64 bug fixes.
Build DLL for windows.
3.0.8 Dec-19-08
Add *BSD, BeOS, and PA-Linux support.
3.0.7 Nov-11-08
Fix for ppc FreeBSD.
(thanks to Andreas Tobler)
3.0.6 Jul-17-08
Fix for closures on sh.
Mark the sh/sh64 stack as non-executable.
(both thanks to Kaz Kojima)
3.0.5 Apr-3-08
Fix libffi.pc file.
Fix #define ARM for IcedTea users.
Fix x86 closure bug.
3.0.4 Feb-24-08
Fix x86 OpenBSD configury.
3.0.3 Feb-22-08
Enable x86 OpenBSD thanks to Thomas Heller, and
x86-64 FreeBSD thanks to Björn König and Andreas Tobler.
Clean up test instruction in README.
3.0.2 Feb-21-08
Improved x86 FreeBSD support.
Thanks to Björn König.
3.0.1 Feb-15-08
Fix instruction cache flushing bug on MIPS.
Thanks to David Daney.
3.0.0 Feb-15-08
Many changes, mostly thanks to the GCC project.
Cygnus Solutions is now Red Hat.
[10 years go by...]
1.20 Oct-5-98
Raffaele Sena produces ARM port.
1.19 Oct-5-98
Fixed x86 long double and long long return support.
m68k bug fixes from Andreas Schwab.
Patch for DU assembler compatibility for the Alpha from Richard
Henderson.
1.18 Apr-17-98
Bug fixes and MIPS configuration changes.
1.17 Feb-24-98
Bug fixes and m68k port from Andreas Schwab. PowerPC port from
Geoffrey Keating. Various bug x86, Sparc and MIPS bug fixes.
1.16 Feb-11-98
Richard Henderson produces Alpha port.
1.15 Dec-4-97
Fixed an n32 ABI bug. New libtool, auto* support.
1.14 May-13-97
libtool is now used to generate shared and static libraries.
Fixed a minor portability problem reported by Russ McManus
<mcmanr@eq.gs.com>.
1.13 Dec-2-96
Added --enable-purify-safety to keep Purify from complaining
about certain low level code.
Sparc fix for calling functions with < 6 args.
Linux x86 a.out fix.
1.12 Nov-22-96
Added missing ffi_type_void, needed for supporting void return
types. Fixed test case for non MIPS machines. Cygnus Support
is now Cygnus Solutions.
1.11 Oct-30-96
Added notes about GNU make.
1.10 Oct-29-96
Added configuration fix for non GNU compilers.
1.09 Oct-29-96
Added --enable-debug configure switch. Clean-ups based on LCLint
feedback. ffi_mips.h is always installed. Many configuration
fixes. Fixed ffitest.c for sparc builds.
1.08 Oct-15-96
Fixed n32 problem. Many clean-ups.
1.07 Oct-14-96
Gordon Irlam rewrites v8.S again. Bug fixes.
1.06 Oct-14-96
Gordon Irlam improved the sparc port.
1.05 Oct-14-96
Interface changes based on feedback.
1.04 Oct-11-96
Sparc port complete (modulo struct passing bug).
1.03 Oct-10-96
Passing struct args, and returning struct values works for
all architectures/calling conventions. Expanded tests.
1.02 Oct-9-96
Added SGI n32 support. Fixed bugs in both o32 and Linux support.
Added "make test".
1.01 Oct-8-96
Fixed float passing bug in mips version. Restructured some
of the code. Builds cleanly with SGI tools.
1.00 Oct-7-96
First release. No public announcement.
libffi was originally written by Anthony Green green@moxielogic.com.
The developers of the GNU Compiler Collection project have made innumerable valuable contributions. See the ChangeLog file for details.
Some of the ideas behind libffi were inspired by Gianni Mariani’s free gencall library for Silicon Graphics machines.
The closure mechanism was designed and implemented by Kresten Krab Thorup.
Major processor architecture ports were contributed by the following developers:
aarch64 Marcus Shawcroft, James Greenhalgh
alpha Richard Henderson
arc Hackers at Synopsis
arm Raffaele Sena
avr32 Bradley Smith
blackfin Alexandre Keunecke I. de Mendonca
cris Simon Posnjak, Hans-Peter Nilsson
csky Ma Jun, Zhang Wenmeng
frv Anthony Green
ia64 Hans Boehm
m32r Kazuhiro Inaoka
m68k Andreas Schwab
m88k Miod Vallat
metag Hackers at Imagination Technologies
microblaze Nathan Rossi
mips Anthony Green, Casey Marshall
mips64 David Daney
moxie Anthony Green
nios ii Sandra Loosemore
openrisc Sebastian Macke
pa Randolph Chung, Dave Anglin, Andreas Tobler
powerpc Geoffrey Keating, Andreas Tobler,
David Edelsohn, John Hornkvist
powerpc64 Jakub Jelinek
riscv Michael Knyszek, Andrew Waterman, Stef O'Rear
s390 Gerhard Tonn, Ulrich Weigand
sh Kaz Kojima
sh64 Kaz Kojima
sparc Anthony Green, Gordon Irlam
tile-gx/tilepro Walter Lee
vax Miod Vallat
x86 Anthony Green, Jon Beniston
x86-64 Bo Thorsen
xtensa Chris Zankel
Jesper Skov and Andrew Haley both did more than their fair share of stepping through the code and tracking down bugs.
Thanks also to Tom Tromey for bug fixes, documentation and configuration help.
Thanks to Jim Blandy, who provided some useful feedback on the libffi interface.
Andreas Tobler has done a tremendous amount of work on the testsuite.
Alex Oliva solved the executable page problem for SElinux.
The list above is almost certainly incomplete and inaccurate. I’m happy to make corrections or additions upon request.
If you have a problem, or have found a bug, please send a note to the author at green@moxielogic.com, or the project mailing list at libffi-discuss@sourceware.org.