Adventures in Autoconfiscation - Part Two of Three

In part one I sketched, in rather breathless terms, a brief history of my XML toolkit Arabica and its evolution. I discussed why I decided to replace Arabica's wobbly build system with something more reliable. The "something more reliable" was I declared, despite a previous failed attempt some years ago, GNU Autotools. In this article, I describe how I made the change and examine whether it really did do what I'd hoped - let more people build Arabica on more platforms, more easily but with less fuss and less effort on my part. This isn't the definitive guide to Autoconf, it's the how-I-did-it narrative which I hope will inform and entertain.

An Autoconf, Automake, Libtool Fly-past

So what is Autotools? Autotools is actually three things - Autoconf, Automake, and Libtool. As far as I can tell Autotools isn't an official name for this little trinity, and while you can use one without the others, in reality nobody does. Autoconf, Automake, and Libtool are tools for portably building and installing applications and libraries on UNIX-like systems. The provide the magic behind the familiar ./configure; make; make install incantation.

When you run ./configure, you're running a shell script which probes your system for various bits and pieces. With the information gathered it processes a number of templates to generate output files. Those files are usually (but not necessarily exclusively) Makefiles, and a header file commonly called config.h.

Autoconf is the tool that generates the configure script. The autoconf command looks for a file called configure.ac, processes the macros it contains and a fresh configure script pops out the end. The configure.ac macros are written in a language called M4. M4 is probably unfamiliar, but isn't difficult to get the hang of. Autoconf includes an extensive library of macros, and writing your own is straightforward. (M4 seems to go back to early in Unix history, but I'm not aware of it being widely used outside of Autoconf. The GNU version is still active though, with the latest release as recently as last November 2006.)

Automake is a Makefile generator. Well, almost. It's a Makefile template generator. Automake looks for a file called Makefile.am, from which it creates a Makefile.in, based on the macros it finds. Later, configure will use the Makefile.in to generate a Makefile.

Libtool looks after the oddities of building, linking, and loading static or shared libraries. It takes care of invoking the compiler and linker properly, as well as installing libraries and binaries which use the libraries according to your platform's conventions. It will, for example, relink after installation if required. In my experience, once you've added the appropriate macro to configure.ac, you don't actually have to do anything with Libtool directly. configure will build a shell script called libtool, specific to your setup, which your Makefiles will invoke as and when.

In addition to the autoconf and automake commands I've already mentioned, the various Autotools packages includes several other commands: autoheader, for example, which generates a skeleton config.h, and aclocal which builds a local copy of the various m4 macros used in configure.ac. They manage some of the other support files used to generate configure, or when configure is run. Changes to configure.ac or a Makefile.am might require any or all of these commands to be run, and in the correct order. Fortunately, there's an uber-command to manage all that for you, autoreconf. Actually in the course of writing this article, I discovered that the dependencies are included in the generated Makefile, so running make at the top level will also provoke everything to regenerate if required.

I may not have made it entirely clear, but Autotools are developer tools. Once the configure script and Makefile.in files have been generated, they become part of the source distribution. People building your source obviously need to have a compiler, but they do not themselves need Autotools.

That's the overview, and is as much (in fact slightly more) that I knew when I started converting Arabica from it's wobbly collection of Makefiles to Autotools.

Building Arabica

The Arabica package includes the Arabica library itself, several test executables, and a number of sample applications. Clearly since the executables use the library, the library must be built first. It requires either the Expat, libxml2, or Xerces XML parser, and needs to include different source files according to the parser used. Executables using Arabica's XPath or XSLT engines need a recent version of Boost. The test programs can be built in both narrow and wide string versions, although not all operating-system/compiler/library combinations support wide strings.

As prerequisites and build options go, Arabica isn't particular extreme, but it's awkward enough to have played a part in prompting my move to Autotools.

The Arabica source lay out is relatively conventional

/
  /include          
  /src               
  /tests/SAX
        /DOM
        /XPath
        ....
  /examples/SAX
           /DOM
           /XPath
           ...

Arabica is primarily implemented in header files, so the bulk of the source sits in /include. The library source of around 20 files, mainly code conversion facets, sits in /src.

Starting small

After discarding my existing Makefiles, I had a big pile of source and no way to build it. My initial goal was to use Autotools to build the Arabica library. I wasn't going to worry about prerequisites, I'd simply assume they were there, nor would I worry about building anything else.

I spent a little time, only an hour or so, reading the configure.ac and Makefile.am files used in a few packages I'd recently built from source. Not surprisingly, I found smaller library packages like Expat more informative the enormous tool suites like GCC. Finally I plugged configure.ac into Google, which pointed me to the Autoconf manual. Reading the manual hadn't helped a great deal last time around, but both the manual and my comprehension skills have improved since then.

The manual states that the order in which configure.ac calls macros is not generally inportant, but must contain a called to AC_INIT to get started and a called to AC_OUTPUT at the end. We'll come to those in a moment. The suggested configure.ac layout is

configure.ac:

AC_INIT(package, version, bug-report-address)
information on the package
checks for programs
checks for libraries
checks for headers
checks for types
checks for structures
checks for compiler characteristics
checks for library functions
checks for system services
AC_CONFIG_FILES([file ...])
AC_OUTPUT

That's a lot of checks to think about, which wasn't what I really wanted to do. So what's the smallest configure.ac you can write?

configure.ac:

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

Let's try that.

$ autoreconf
$ ./configure

It worked! It worked in as much as I got a configure script, and the script ran. Note that the M4 quote characters are '[' and ']'.

Since I'm building a library written in C++, I need to initialise Automake, find a C++ compiler, and set up libtool.

configure.ac:

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

AM_INIT_AUTOMAKE

AC_PROG_CXX
AC_PROG_LIBTOOL

Macros starting AC_ are Autoconf macros while macros beginning AM_ are Automake macros, described in the Autoconf or Automake manuals. The exception is AC_PROG_LIBTOOL, which is the macro to setup Libtool.

Running that ...

$ autoreconf
configure.ac:6: required file `./config.sub' not found
configure.ac:6:   `automake --add-missing' can install `config.sub'
configure.ac:3: required file `./missing' not found
configure.ac:3:   `automake --add-missing' can install `missing'
configure.ac:3: required file `./install-sh' not found
configure.ac:3:   `automake --add-missing' can install `install-sh'
configure.ac:6: required file `./config.guess' not found
configure.ac:6:   `automake --add-missing' can install `config.guess'
automake-1.10: no `Makefile.am' found for any configure output
autoreconf-2.60: automake failed with exit status: 1

Ah-ha. Not so good. The missing files are shell scripts used by configure during a build. They are included in the Automake package, and running automake --add-missing pulls them into your source tree by adding symbolic links. I take the view that symbolic links that point out of a source tree really don't mix with sensible source control procedures, and so removed the symlinks and copied the files across.

I created an empty Makefile.am, and reran autoreconf.

$ autoreconf
automake-1.10: no `Makefile.am' found for any configure output
automake-1.10: Did you forget AC_CONFIG_FILES([Makefile]) in configure.ac?
autoreconf-2.60: automake failed with exit status: 1

Well, I didn't forget, I just hadn't yet.

configure.ac:

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

AM_INIT_AUTOMAKE

AC_PROG_CXX
AC_PROG_LIBTOOL

AC_CONFIG_FILES([Makefile])
AC_OUTPUT

AC_CONFIG_FILES(filename) primes the AC_OUTPUT macro to create 'filename'. The file is created by copying an input file (by default 'filename.in'), substituting variables as it goes. For Makefiles, automake creates the Makefile.in from Makefile.am. I don't know exactly how or when this happens, but it does. I have faith.

$ autoreconf
Makefile.am: required file `./INSTALL' not found
Makefile.am: `automake --add-missing' can install `INSTALL'
Makefile.am: required file `./NEWS' not found
Makefile.am: required file `./README' not found
Makefile.am: required file `./AUTHORS' not found
Makefile.am: required file `./ChangeLog' not found
Makefile.am: required file `./COPYING' not found
Makefile.am: `automake --add-missing' can install `COPYING'
configure.ac:6: required file `./ltmain.sh' not found
autoreconf-2.60: automake failed with exit status: 1

I'm curious as to why these missing files weren't picked up last time around, but I ran automake --add-missing again anyway. Since Autotools were developed for the GNU project, they expect certain things - NEWS, README, etc - to conform to GNU standards. It is possible to relax this requirement, but I didn't bother. Be aware that the COPYING file automake creates for you is the GNU General Public License, as you may want to substitute another license.

It took me a while to find where ltmain.sh should come from. I discovered that autoreconf also has an option to create or copy the missing support files.

$ autoreconf --install
$ autoreconf
$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for g++... g++
... 85 other lines snipped ...
configure: creating ./config.status
config.status: creating Makefile
config.status: executing depfiles commands
$ make
make: Nothing to be done for 'all'.

Well, that looks almost convincing. Now to get it to build something.

Getting the library built

So far I've been working the root directory of my source tree. The Arabica library source sits in the /src subdirectory. I've no particular aversion to recursive Makefiles, so I want my top level Makefile to call src/Makefile. Since automake is creating my Makefiles, I'll need Makefile.am and src/Makefile.am. Recursive make invocations can be contentious, but I'll just note that Autotools doesn't require the use of recursive make. Personally, I don't have any particular problem with recursive make invocations, but I know they are anathema to some. Perhaps I should say that I haven't (yet) had any problems.

The top level Makefile.am is straightforward. Subdirectories are specified using the SUBDIRS variable. You can specify any number of directories, and they will be visited in the order given. The subdirectories don't have to contain a Makefile.am, only a Makefile, which allows third-party party packages to be included in the build. At the moment, I'm only interested in the one subdirectory.

Makefile.am:

SUBDIRS = src

Running autoreconf, configure and make -

$ make
Making all in src
make[1]: Entering directory '/home/jez/arabica/src'
make[1]: *** No rule to make target 'all'. Stop.
make[1]: Leaving directory '/home/jez/arabica/src'
make: *** [all-recursive] Error 1

I'm on the verge of actually building something. What goes in src/Makefile.am? Automake uses a combination of variables and naming conventions to describe what should be built. A variable which tells automake what is being built is called the primary. The _LTLIBRARIES primary declares libraries that are built with libtool.

src/Makefile.am:

lib_LTLIBRARIES = libarabica.la
libarabica_la_SOURCES = arabica.cpp \
                        SAX/helpers/InputSourceResolver.cpp \
                        ... other source files ...

Automake defines a number of Makefile targets automatically, including install. The lib prefix on the _LTLIBRARIES primary says that the library should be installed in Automake's libdir. libdir usually points to the system's library path (e.g. /usr/local/lib), unless overriden by passing an option to ./configure. Other prefixes include pkglib and noinst, to install to a package specific directory or to mark the library as not to be installed, respectively.

Note how the name of the library to build becomes the prefix of the _SOURCE variable which gives its source files. Here, I've listed the C++ source files located in the /src directory, but the library also uses a number of header files from the /include directory. A large number, actually, which I was too lazy to list in its entirety. Instead I decided simply add the /include directory to the compiler include path. Additional flags can be passed to the compiler (or more correctly the preprocessor) using the AM_CPPFLAGS variable.

The backslash character '\' is, as in normal Makefiles, the line continuation character.

src/Makefile.am:

AM_CPPFLAGS = -I$(top_srcdir)/include

lib_LTLIBRARIES = libarabica.la
libarabica_la_SOURCES = arabica.cpp \
                        SAX/helpers/InputSourceResolver.cpp \
                        ... other source files ...

$top_srcdir is a predefined autoconf variable, which is the relative path to the top-level source directory. Autoconf and Automake define quite a number of variables like this, pointing to the source directory, the build directory, and so on. Unless a variable's purpose isn't clear from its name, I won't highlight them further.

Now the src/Makefile.am looks complete, the final job is list it in configure.ac so that src/Makefile will be created.

configure.ac:

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

AM_INIT_AUTOMAKE

AC_PROG_CXX
AC_PROG_LIBTOOL

AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([src/Makefile])
AC_OUTPUT

After another round of autoreconf; ./configure; make now gives

$ make
Making all in src
make[1]: Entering directory `/home/jez/work/arabica'
/bin/sh ../libtool --tag=CXX --mode=compile g++ -DPACKAGE_NAME=\"Arabica\" -DPACKAGE_TARNAME=\"arabica\" -DPACKAGE_VERSION=\"Jan07\" -DPACKAGE_STRING=\"Arabica\ Jan07\" -DPACKAGE_BUGREPORT=\"jez@jezuk.co.uk\" -DPACKAGE=\"arabica\" -DVERSION=\"Jan07\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -I. -I../include -g -O2 -MT arabica.lo -MD -MP -MF .deps/arabica.Tpo -c -o arabica.lo arabica.cpp
mkdir .libs
... other source files ...
ar cru .libs/libarabica.a arabica.o InputSourceResolver.o base64codecvt.o iso88591_utf8.o ucs2_utf16.o ucs2_utf8.o iso88591utf8codecvt.o rot13codecvt.o ucs2utf8codecvt.o utf16beucs2codecvt.o utf16leucs2codecvt.o utf16utf8codecvt.o utf8iso88591codecvt.o utf8ucs2codecvt.o XMLCharacterClasses.o
ranlib .libs/libarabica.a
creating libarabica.la
(cd .libs && rm -f libarabica.la && ln -s ../libarabica.la libarabica.la)
make[1]: Leaving directory `/home/jez/work/arabica'
make[1]: Entering directory `/home/jez/work/arabica'
make[1]: Nothing to be done for `all-am'.
make[1]: Leaving directory `/home/jez/work/arabica'

Goal!

It's worked. The library has built. I actually became momentarily light-headed when this happened. Even though I've been using make and writing Makefiles for years now, I generally start a new Makefile by copying an existing one because I don't usually get them right from a standing start. And here I was, after only an afternoon's work, with a configure script that seemed to actually be configuring and working. I'd been developing using Cygwin, so I verified my new configure script on Ubuntu Linux, FreeBSD, DragonflyBSD and GNU/Darwin boxes (virtualisation is a wonderful thing). It worked and the library built on all of them. I went for a lie down.

Building everything else

The library was built, but did it actually work? Time to build the test suite. The build needs to recurse down into the tests subdirectory and again its off into subdirectories. I added tests to the SUBDIRS variable in the top level Makefile.am, and created a tests/Makefile.am which specifies the next SUBDIRS level. I added extra AC_CONFIG_FILES calls to configure.ac.

Building a program using Autotools is very similar to building a library, but you use the _PROGRAMS primary rather than the _LTLIBRARIES primary.

test/Utils/Makefile.am:

noinst_PROGRAMS = utils_test

AM_CPPFLAGS = -I($top_srcdir)/include

LIBARABICA = $(top_builddir)/src/libarabica.la

utils_test_SOURCES = utils_test.cpp \
                       ... more source ...
utils_test_LDADD = $(LIBARABICA)
utils_test_DEPENDENCES = $(LIBARABICA)

Since this is a test program and does not need to be installed, I've given _PROGRAMS the noinst prefix. As with libraries, the _SOURCES variable lists the program's source files. Extra libraries that a program needs to link are given in the _LDADD variable. It is sometimes useful to have a program depend on some other target which is not actually part of that program. This is done using the _DEPENDENCIES variable. I've included libarabica as a dependency to ensure the program is relinked if the library is changed. Note how it's also possible to declare your own variables in a Makefile.am.

So does it work?

$ autoreconf
$ ./configure
... stuff ...
$ make
... more stuff ...
$ test/Utils/utils_test.exe
StringTest
.......
OK (7 tests)

It does.

It sounds silly to say it, but I felt smugly pleased with myself once I had the libraries and the test suite building. For so long, I'd found, as a user, Autotools to be a fantastic thing, because the ./configure; make; make install incantation just worked. As a developer, I'd regarded it as a strange and scary beast. As my test cases passed, the beast was slain.

Compared to my previous build system things had already improved, because Arabica was now a package that would build out-of-the-box on all the platforms I had to hand. My page long set of build instructions could be thrown away, replaced with a three item bulletted list.

I wasn't yet at one with Autotools, but I was comfortable enough and now had the confidence to start extending the build to look check which XML parser was available, find whether the Boost libraries were available, and invoke the test suite. I walk through some of that in part three, as well as looking at some of the expected and unexpected benefits of converting to Autotools.


Jez Higgins