The linux loader, and how it finds libraries: ld-linux and so on

loader_12As part of an effort to understand implications of different installation procedures on linux, I investigated how executables find shared object libraries (libxyz.so) for dynamic loading, and thus what an installer (program or human) needs to configure.

Overview

(Note: Originally written in 2009. Any updates are to fix typos. Thanks commenters!)

Most compiled programs on linux need to call shared objects — modules provided by other packages that are loaded and linked-to dynamically (ie: at run time).  On linux the loading of needed modules is performed when an executable is launched, by the gnu loader module, ld which relies in turn on /lib/ld-linux.so.

The way in which ld finds the requested libraries goes a long way to explaining what a package installer must do to make the package work properly, and to allow other programs to find the libraries of the new package.

ELF: Executable and Linking Format

Most compiled programs on Linux are compiled into a format known as ELF. Amongst other things, ELF defines a header for executable files, which contains attributes of the executable, some of which are important for the loading process.

The program readelf can be used to view the header of such programs or .so libraries:

  • readelf -a filename       Shows all header info
  • readelf -d filename      Shows only data from the “dynamic” section

The “dynamic” section of the header is of interest because it contains data used during the initial loading process, such as:

  • NEEDED: libraries needed by this module
  • RPATH: See “Loader search procedure” below
  • SONAME: If this module is a library, this item shows the “soname” of the library.

(The ldd program provides information similar to this list of NEEDED libraries, but also adds the libraries needed by the NEEDED libraries, and so on.)

Loader search procedure

When a compiled program is launched on linux, its header is inspected to see what shared objects (libraries, xxx.so) it requires, and these are loaded. Each .so itself has a similar header, which can also specify other needed libraries, and so on.

From man ld-linux.so:

The shared libraries needed by the program are searched for in various places:

  1. DT_RPATH: Using the DT_RPATH dynamic section attribute of the binary if present and DT_RUNPATH attribute does not exist. Use of DT_RPATH is deprecated. (Ie: this is a value that can be included in an executable’s ELF header.  There’s apparently some controversy over whether DT_RPATH really overrides LD_LIBRARY_PATH — GW).
  2. LD_LIBRARY_PATH: Using the environment variable LD_LIBRARY_PATH. Except if the executable is a set-user-ID/set-group-ID binary, in which case it is ignored. 
  3. DT_RUNPATH: Using the DT_RUNPATH dynamic section attribute of the binary if present. (Ie: the executable can provide a list of paths t search for objects to load. However, DT_RUNPATH is not applied at the point those objects load other objects. — GW)
  4. /etc/ld.so.cache : From the cache file /etc/ld.so.cache which contains a compiled list of candidate libraries previously found in the augmented library path. If, however, the binary was linked with -z nodeflib linker option, libraries in the default library paths are skipped.
  5. Default paths: In the default path /lib, and then /usr/lib. If the binary was linked with -z nodeflib linker option, this step is skipped.

Let’s elaborate on these search locations:

  1. DT_RPATH: deprecated, however seems to be used sometimes. This may be useful for a program to specify the location of .so’s supplied within the same package and not necessarily useful to others. Eg: currently readelf shows that /user/bin/mysql program has an RPATH of /usr/lib/mysql, ie: it points to a mysql-specific subdir of /usr/lib.
    .
  2. LD_LIBRARY_PATH: This method for finding libraries is usefully configured from a shell script that launches the program proper, and hence it’s good for libraries that don’t need to be shared in general by the methods below. This seems to be used by
    1. Developers temporarily switching libraries for testing or
    2. Program suites that supply shared objects for their own use , but which don’t need to be shared with the rest of the system’s applications.  (In some quarters LD_LIBRARY_PATH is “considered harmful”.)
      .
  3. DT_RUNPATH: (Seems not used much?)
    .
  4. /etc/ld.so.cache : This is an important case, see below.
    .
  5. Default paths: /lib (used for libraries of system packages), and then /usr/lib (possible location for the libraries of non-system packages).

Of these, /etc/ld.so.cache is used prominently by non-system packages, and needs more explanation.

/etc/ld.so.cache: cross-references

ld.so.cache provides a cross-reference from a shared-object’s name to its full path.  It is used by ld-linux as one of the methods to find the actual .so’s that are required by an executable being loaded.  ld.so.cache can be manipulated using the /sbin/ldconfig program:

  • ldconfig -p      Lists the cross-references currently known to the cache
  • ldconfig           (no args) Re-survey directories where libraries reside, making needed file symbolic links and updating the cache. (More on this below).

For example, readelf shows that the mysql program’s header dynamic section lists several NEEDED shared libraries, including libncurses.so.5.  Then, ldconfig -p shows:

libncurses.so.5 (libc6) => /usr/lib/libncurses.so.5

… and finally ls shows that /usr/lib/libncurses.so.5 is a symbolic link to libncurses.so.5.5, which is the actual file containing the library.

Note on version numbering

The example shown here follows the pattern: libname.so.major.minor, where major and minor are major and minor version numbers. Libraries generally have an internal “SONAME” (as can be viewed with readelf) that includes the major version number but excludes the minor version number. The SONAME is also the name listed in the NEEDED listing to indicate a needed library.

The logic here appears to be that the major version number identifies a particular API (exact suite of functions which might change from major version to major version, and which needs to match what the calling program expects), while the minor version number indicates no change in API, perhaps bug fixes or other internal improvements only.

The above scheme is not strict, for example we have libdl-2.5.so, which has an soname of libdl.so.2 (and corresponding symbolic link).

/etc/ld.so.cache, symbolic links: Maintained by ldconfig

We just saw that ld.so.cache contains cross-references from a library’s SONAME to an actual file path, though that path usually leads to a symbolic link pointing to the actual .so file needed. But what creates/updates this set of data and links?

That’s the job of ldconfig — a program that can be run at any time (for example as part of an install process), and in most systems is set to run at every boot-up, to be on the safe side.

ldconfig input

ldconfig needs to know what directories to survey. By default it surveys “trusted directories” /lib and /usr/lib. In addition, ldconfig consults configuration file /etc/ld.so.conf, a text file which provides a list of directories to survey.

According to one convention, ld.so.conf contains an instruction: “include /etc/ld.so.conf.d/*.conf“, establishing a directory in which new packages can place their own xyz.conf file to make their own lib directories available.

ldconfig effect

A default (no args) invocation of ldconfig surveys the listed directories and reads the SONAME information from each library file. With this info, ldconfig performs two main actions:

  • Link: Create (or replace) a symbolic link whose name matches soname (eg: libxyz.so.3), pointing to the actual library file (eg: libxyz.so.3.4).
  • ld.so.cache item: Create or replace an item in ld.so.cache which cross-references soname to the full path to the just-mentioned symbolic link. (eg: libxyz.so.3 –> /usr/lib/libxyz.so.3)

Assuming that all dependent programs and libraries list required libraries (their NEEDED list) by their sonames, this ldconfig activity will result in the necessary info to allow ld-linux to find them when an executable program is loaded.

Related issues

Additional links

If you inspect a lib directory, you may see symbolic links beyond just the ones that match the SONAME of the corresponding library file. For example, there might be a link like:    libxyz.so –> libxyz.so.3. In otherwords a link whose name ends in the plain no-version .so suffix.   These are not created by ldconfig, but may be provided by some other means during the installation.  It appears that these are for the use of of the linker process during development. (Possibly for static linking?  Can a generic version-ignorant library be accommodated by the compile-link process?)

References

Advertisements

6 Comments

  1. Daniel Taylor
    Posted 2009-05-07 at 8:41 am | Permalink | Reply

    The link mechanism is to allow for multiple versions of the same library to be simultaneously installed.

    The plain libfoo.so name links to the system-preferred library in that case and dynamically linked programs that do not have a hard-coded library version preference will use the system preferred library in that case.

    This is particularly useful when doing library development, as it allows for rapid switching between different versions of a library for benchmarking and regression testing.

    • Posted 2012-04-12 at 6:50 pm | Permalink | Reply

      In that case the ldd will show an entry like:

      libxyz.so => /lib/libxyz.so

      ?

      and what about the addresses that follows the entries returned by ldd, what they mean?

      pretty nice post.

  2. Arjuna Wijesurendra
    Posted 2012-01-01 at 7:03 pm | Permalink | Reply

    Commendable write up. I spotted a minor typo under the section “ldconfig input”: the “trusted directories” are /lib and /usr/lib.

  3. Posted 2012-01-01 at 9:03 pm | Permalink | Reply

    Thanks Arjuna for spotting that and taking the time to comment. I have updated accordingly. — Graham

  4. Posted 2013-09-12 at 2:42 pm | Permalink | Reply

    Great post!

    I found a typo. Look for this text:

    According to one convention, ld.so.conf contains an instruction: “include /etc/ld.so.cond.d/*.conf“

    The “cond” should be “conf”

Post a Comment

Required fields are marked *

*
*

%d bloggers like this: