Display HN: A 166 KB file for cross compiling glibc for any version, any target

74
Display HN: A 166 KB file for cross compiling glibc for any version, any target

This repository incorporates .abilist files from every version of glibc. These
files are consolidated to generate a single 166 KB symbol mapping file that is
shipped with Zig to target any version of glibc. This repository is for Zig
maintainers to spend when a novel glibc version is tagged upstream; Zig users like
no need for this repository.

Adding novel glibc version .abilist files

  1. Clone glibc
git clone git://sourceware.org/git/glibc.git
  1. Test up on the novel glibc version git label, e.g. glibc-2.34.

  2. Flee the tool to interact the novel abilist files:

zig go collect_abilist_files.zig -- $GLIBC_GIT_REPO_PATH
  1. This mirrors the itemizing construction into the glibc subdirectory,
    namespaced below the version number, nonetheless fully copying files with the
    .abilist extension.

  2. Scrutinize the adjustments and then commit these novel files into git.

Updating Zig

  1. Flee consolidate.zig on the foundation of this repo.

This can also generate the file abilists which it is doubtless you’ll perhaps perhaps perhaps then peep and be particular that
it is OK. Reproduction it to $ZIG_GIT_REPO_PATH/lib/libc/glibc/abilist.

Debugging an abilists file

zig go list_symbols.zig -- abilists

Technique

The abilist files from the most contemporary glibc are practically satisfactory to utterly
encode the total records that we now must generate the symbols db. The fully
assert is when a characteristic migrates from one library to one other. For instance,
in glibc 2.32, the characteristic pthread_sigmask migrated from libpthread to libc,
and the most contemporary abilist files fully conceal it in libc. Nonetheless, if a person targets
glibc 2.31, Zig needs to know to build the symbol into libpthread.so and no longer
libc.so.

In glibc upstream, they simply renamed the abilist files from pthread.abilist to
libc.abilist. This resulted within the next line being contemporary in libc.abilist
in glibc 2.32 and later:

GLIBC_2.0 pthread_sigmask F

This means that in glibc 2.0, libc.so has the pthread_sigmask symbol, which
is wrong, since it became once fully realized in libpthread.so.

For this reason this repository incorporates abilist files from all past
variations of glibc as effectively as doubtless the most latest one – it lets in us to
detect this anguish, and generate a corrected symbols database.

The formulation is to originate with the earliest glibc version, utilize the abilist
files, and then take care of that records as true. Next we sail on to the next
earliest glibc version, nonetheless now we now must detect a contradiction: if the more contemporary
glibc version claims that e.g. pthread_sigmask is provided in glibc 2.0,
when our true records says that it does no longer, we ignore that wrong piece of
records. Nonetheless we must procure novel records if the version it talks about is higher
than the version just like the “true” records space.

After merging within the more contemporary glibc version, we note the novel dataset as
“true” and sail on to the next, and so on till we now like processed the total
sets of abilist files.

When this course of completes, we now like in reminiscence something that appears like this:

  • For every glibc symbol
    • For every glibc library
      • For every target
        • For every glibc version
          • Whether or no longer the symbol is absent, a characteristic, or an object+size

And our job is now to encode this records true into a file that does no longer extinguish
set up size and yet remains straightforward to decode and spend within the Zig compiler.

Inclusions

Next, the script generates the minimal form of “inclusions” to encode the total
records. An “inclusion” is:

  • A symbol title.
  • The gap of targets this inclusion applies to.
  • The gap of glibc variations this inclusion applies to.
  • The gap of libraries this inclusion applies to.
  • Whether or no longer it is a characteristic or object, and if an object, its size in bytes.

For instance, motivate in thoughts dlopen. An inclusion is something like this:

  • dlopen
  • targets: aarch64-linux-gnu powerpc64le-linux-gnu
  • variations: 2.17 2.34
  • libraries: libdl.so
  • form: characteristic

This does no longer conceal the total locations dlopen may perhaps perhaps even very effectively be realized nonetheless. There will
must serene be extra inclusions for added targets, as an instance:

  • dlopen
  • targets: x86_64-linux-gnu
  • variations: 2.2.5 2.34
  • libraries: libdl.so
  • form: characteristic

Now we now like extra protection of the total locations dlopen may perhaps perhaps even very effectively be realized, nonetheless there are
yet extra that must serene be emitted. The script emits as many inclusions as
crucial so as that every person the records is represented.

Next we procedure few observations which result in a extra compact records encoding.

Commentary: All symbols are repeatedly either functions or objects

There is no longer any symbol that is a characteristic on one target, and an object on one other
target. Equally there’s no longer any symbol that is a characteristic on one glibc version,
nonetheless an object in one other, and there’s no longer any symbol that is a characteristic in one
shared library, nonetheless an object in one other.

We exploit this by encoding functions and object symbols in separate lists.

Commentary: Over half of the objects are exactly 4 bytes

51% of all object entries are 4 bytes, and 68% of all object entries are either
4 or 8 bytes.

Total object inclusions are 765. If we kept 4 and eight byte objects in separate
lists, this would build 2 bytes from 520 inclusions, totaling 1 KB. No longer price.

Commentary: Moderate form of utterly different variations per inclusion is 1.02

Almost every inclusion has in general 1 version linked to it, no longer incessantly extra.
This makes a u64 bitset uneconomical. With 19530 total inclusions, this comes
out to 153 KB spent on the version bitset. Nonetheless if we encoded it as one byte
per version, using 1 little bit of the byte to conceal the terminal merchandise, this would
lift the 153 KB down to 19 KB. That is practically a 50% cut price from the total
size of the encoded abilists file. Positively price it.

Binary encoding layout:

All integers are kept little-endian.

  • u8 form of glibc libraries (7). For every:
    • null-terminated title, e.g. “c”, “m”, “dl”, “ld”, “pthread”
  • u8 form of glibc variations (44), sorted ascending. For every:
    • u8 well-known
    • u8 minor
    • u8 patch
  • u8 form of targets (20). For every:
    • null-terminated target triple
  • u16 form of characteristic inclusions (18765)
    • null-terminated symbol title (no longer repeated for subsequent identical symbol inclusions)
    • Intention of Unsized Inclusions
  • u16 form of object inclusion sets (2165)
    • null-terminated symbol title (no longer repeated for subsequent identical symbol inclusions)
    • Intention of Sized Inclusions

Intention of Unsized Inclusions:

  • u32 space of targets this inclusion applies to (1 << INDEX_IN_TARGET_LIST)
    • final inclusion is indicat
Knowasiak
WRITTEN BY

Knowasiak

Hey! look, i give tutorials to all my users and i help them!