Linux package installation, dependencies, loading

pckg01_snip1Several interlocking topics explored that relate to understanding installation of linux packages: What dependencies does the installation need to take care of? Much of the answer lies in how executables are found and loaded when requested.

Overview

A major component of installation success involves positioning a package’s files in an orderly way within the directory tree, and setting mechanisms (links, PATH etc) so that these files can be found when needed:

  • how the parts of this package are found by the user (via a shell),
  • how executables in this package find libraries (shared objects) and
  • how other packages find and call parts of this package.

Getting all that done is the intent of package managers like rpm and yum, or of the configure-make process if installing from source. However, these methods need to be used with some knowledge of the tradeoffs to be made when selecting options — and when an installation doesn’t work properly, there will be troubleshooting and remedial measures to perform. This article looks first at the issue of directory structure conventions, and then at various dependency scenarios, and how files are found in each case (links, PATH etc).

Typical package directory layout

Most application packages install a suite of files which follow one or another unix/linux convention regarding directory structure, conventions which differ from each other. One such convention is the Filesystem Hierarchy Standard (FHS), (also: Redhat 9 docs on FHS, or unreadable RHEL5 version) which prescribes alternatives for how a package could arrange its files. For comparison, Apache source comes with file config.layout which offers over a dozen different layouts for unix, several of which are be plausible on linux, conforming to the different conventions described below. For the most part, an application package will have directories that follow a structure like this:

  • …/InstallDir/
    • bin (programs and scripts)
    • lib (library shared objects)
    • libexec (“small” utility programs called by other programs)
    • include (header file source code)
    • source (source code)
    • and so on

Conventions for where this is to be placed on a system include:

  • InstallDir equal /usr. This results in the package’s programs in /usr/bin, the libraries in /usr/lib, and so on.
  • InstallDir equals /usr/local. This results in the package’s programs in /usr/local/bin, the libraries in /usr/local/lib, and so on.

In the above two cases, the package’s new files are dropped into directories that already exist and already contain files from previously-installed packages. This obviously has potential for occasional filename collisions, and is a muddle to deal with manually — but if the files have been installed by, for example, rpm, then rpm can keep track of the muddle, one hopes. The benefit of installing to these standard directories is that it’s highly likely that users will already have PATH set up to point to them, so no special actions are needed to be able to invoke these new programs. The libraries in /usr/bin are likewise accessible with less effort than otherwise (as detailed in discussions below), though not those in /usr/local/bin. Other directory layout conventions:

  • InstallDir equals /usr/local/xyz (where xyz is the package name, like “apache2” or “mysql”.
  • InstallDir equals /opt/xyz (where xyz is the package name, like “apache2” or “mysql”.
  • InstallDir equals /home/fred/myprogs/xyz. User fred installs an application for their own use only.

In these cases, the package creates a completely new subdirectory for its files and directory tree. This keeps the files separated from those of other packages, and also avoids collisions between packages, or between different versions of the same package. On the downside, more steps are required to allow these programs to be launched easily from a shell, and for these libraries to be callable from other packages. Sidenote: Many packages have configuration files, such as “xyz.conf” or similar. Conventional locations for these:

  • /etc/xyz.conf: Ie: many configuration files are stored in /etc.
  • /etc/xyz/xyz.conf: Some packages create a subdirectory of /etc, in which to store possibly several config files.
  • InstallDir/etc/xyz.conf, or InstallDir/etc/conf/xyz.conf. If a package is installed in its own separate directory tree (eg: /opt/xyz/) then it may maintain its config file within that tree, usually in a subdir etc/, with possible additional subdirs.

With these variations in package installation locations covered, next let’s look at the various dependency situations that need to find those packages and access or call their files.

How an executable is found when launched from a shell

This is probably the dependency most exposed to end-users: How does the shell find executables (programs or scripts) when a user requests one from the shell prompt? The widely-known answers:

  1. Absolute path: If the user types the name of the executable file preceded by an absolute or relative directory path, then the shell looks for the program only at that specific location, and if it’s there, loads and runs it.
  2. Program name only: If the user types just the name of the executable, then the shell looks at the PATH environment variable, and proceeding left to right in PATH’s list of directories, the shell attempts to find and run the executable at each of those locations, until one is found.
  3. Special case: in same dir: In many, if not most, user setups, in the special case where the user types in only the executable file name (no path) and it happens to be located in the shell’s present working directory, this does not lauch that executable. Instead the user is required to type in ./progname (this is intended as a safety feature.

Implications for installation and invocation

Either:

  • Executables need to be installed at a directory location that very likely already appears in users’ PATH variable (such as /bin or /usr/bin). Or…
  • An additional directory entry needs to be added to the user’s PATH variable. Or…
  • The user is required to type in the entire path name to invoke the executable, or perhaps make their own script to do so, etc.

Within-package dependencies

Some dependencies might be termed “within-package” dependencies — dependencies between features of this one package. An example would be the association between mysql server and its database files. This kind of dependency is often covered by statements in configuration files, here mysql’s configuration file /etc/my.cnf. In other cases, a package’s own features might be located at directory paths that are hard-coded into the executable, possibly as set by an option during the intial “configure” process (if installing from source).

Within- and between-package dependencies: libraries and ld-linux

pckg02Packages generally need to call the services of other packages. Probably all programs need to call the libraries of the GNU C Library (glibc), which is supplied with the linux operating system. Some user-installed packages need to call the services of other user-installed packages: For example, Apache can be configured to call on php, and php can call on mysql. Libraries are generally compiled into executable “shared object” library files, with internal names like xyz.so, and actual filenames like xyz.so.0.5 (includes version number). Where should these be installed and how does one package find the .so’s of glibc or of another package? Much of this story revolves around the linux conventions by which one executable declares that it depends on certain others, and the actions of the linux loader (ld-linux) in finding and loading those libraries when it loads an executable. For a detailed discussion of ld-linux, associated ld.so.cache and ld.so.conf, and the tools that work with these, see here.

Implications for installation and invocation

In brief, the implications include these alternatives:

  • Put new .so’s in directories where the ld-linux already looks, such as /usr/lib (ie: InstallDir equals /usr). Or…
  • Add configuration information for ld-config so that it looks in new directories that your installation has added. Or…
  • Have users launch programs that need your .so’s using a script that augments the LD_LIBRARY_PATH environment variable to tell ld-config additional places to look. Or…
  • The developers could include instructions into the executable (possible settable using configure during install) as to where to tell ld-linux to find libraries.

Note that if the new package includes .so’s just for its own use only, it will still need ld to load those .so’s, and thus need to use one of the methods to tell ld where to find them. More details on how ld-config finds libraries, and tools for inspecting and troubleshooting, are here.

Alternative between-package case: add-ons

Described above are cases where program ProgA calls libraries of package PackB, usually helped by PackB having made sure that ld-config knows how to find PackB’s .so’s. An alternative plan is for A to define a convention for add-ons, and allow packages like B to place an interface library .so within a special sub-directory within A’s directories. This idea is exemplified by Apache, with its modules subdirectory, in which php’s libphp5.so can be places. (Required also a LoadModule statement in apache’s conf/httpd.conf).

Between-package dependencies: programs, scripts or data files

There are a variety of other situations where a program needs to access the features of another package:

  • Program ProgA needs to read or write files associated with package PackB. Most likely, ProgA is told about the location of PackB’s files via a configuration setting in ProgA’s config file(s). .
  • ProgA runs a program or script ProgB that’s part of package PackB (via spawning a new process). This might seem similar to calling a library of PackB, but it isn’t. In this case,ProgA invokes ProgB just like a user would via a shell, and so the same considerations apply: either ProgA must know the exact path to ProgB, or the PATH environment variable that’s in effect must include the path to ProgB.

References

Advertisements

Post a Comment

Required fields are marked *

*
*

%d bloggers like this: