Adventures in Autoconfiscation - Part Three of Three

In the preceding two episodes, I've described my picaresque journey taking my XML toolkit Arabica from its wobbly homegrown build system toward GNU Autotools, the magic behind ./configure; make; make install. As I begin this article, I'm at the point where I have a no-frills build going. Here I cover adding some flexibility to the build, so it adapts to the presence or absence of third party libraries. Finally, I'll do what I said I'd do last time and examine whether the change to GNU Autotools really did do what I hoped - let more people build Arabica on more platforms more easily but with less fuss and effort on my part.

This isn't the definitive guide to Autotools, it's the how-I-did-it narrative which I hope will inform and entertain.

Customising the Build - config.h

The Arabica library builds on top of a third party parser library - expat, libxml2, or Xerces. The old build system required people to know which library they had (or which of several they wanted to use), the location of its header and shared object files, and set Makefile flags accordingly. The flags were used to generate a C++ header file, ArabicaConfig.h, containing a number of #defines. Those #defines were, in turn, used to pull in the appropriate library binding. There were other flags too, which controlled things like wide character support, and whether to use Winsock or 'proper' BSD sockets. Some were set on a platform basis, some at the user discretion. A typical ArabicaConfig.h looked like this

#ifndef ARABICA_ARABICA_CONFIG_H
#define ARABICA_ARABICA_CONFIG_H

#define ARABICA_NO_WCHAR_T

#define ARABICA_NO_CODECVT_SPECIALISATIONS

#define USE_EXPAT

#endif

Autotools can produce a similar header file, setting flags for all the various bits and pieces it has probed. All you have to do is ask, and you ask by add the AC_CONFIG_HEADERS macro in configure.ac.

configure.ac:

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

AM_INIT_AUTOMAKE

AC_PROG_CXX
AC_PROG_LIBTOOL

AC_CONFIG_HEADERS([include/SAX/ArabicaConfig.h])
AC_CONFIG_FILES([Makefile])
...
AC_OUTPUT

A quick round of autoreconf and ./configure

...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating test/Makefile
config.status: creating test/Utils/Makefile
config.status: creating include/SAX/ArabicaConfig.h
config.status: executing depfiles commands
...

produces an ArabicaConfig.h which looks like this

/* include/SAX/ArabicaConfig.h. Generated from ArabicaConfig.h.in by configure. */
/* include/SAX/ArabicaConfig.h.in. Generated from configure.ac by autoheader. */

/* Define to 1 if you have the header file. */
#define HAVE_DLFCN_H 1

/* Define to 1 if you have the header file. */
#define HAVE_INTTYPES_H 1

/* Define to 1 if you have the header file. */
#define HAVE_MEMORY_H 1

... several other #defines I'm not actually interested in ...

Looks like just the job. All I need to do is get my Arabica specific defines in there, and I'm done. This is the kind of thing you see configure scripts doing all the time, so I assumed it must be straightforward.

You extend what configure looks for by writing your own Autoconf macros. Autoconf macros are written in a language called M4. Although it's been around since shortly after the dawn of Unix, M4 isn't something you encounter every day. As it turns out, you're actually unlikely to encounter it writing Autoconf macros either. Almost every significant thing you need to do in a custom Autoconf macro already exists in the Autoconf macro library, so you're working at one remove from raw M4. You can write your macros inline in configure.ac, but it's more common to pop them in an external file and just have the call macro in configure.ac. By convention macro files go in a subdirectory called /m4, where they'll be picked up automatically.

Autoconf macros have the general form

AC_DEFUN([macro-name],
         [macro-body])

The body can extend over several lines if necessary. Remember that configure is a shell script. Custom macros in configure.ac are, therefore, being expanded to shell script fragments, which become part of configure. In that sense they are metaprograms, but I'm probably making it sound more difficult that it is. It should be born in mind, however, that the output of an Autoconf macro is shell script which will be run later, possibly on an unknown platform. The Autoconf manual has extensive guidance on writing portable shell script, but for common operations an indepth understanding on quite how awk operates on Ultrix-1 isn't necessary. The following example should make this clearer.

Customising the Build - Finding an XML Parser

Arabica builds on a the expat, libxml2, or Xerces XML parsers. I need my configure to script to find which of these are available. If there are multiple choices choose amongst them.

The first job was to write a macro to look for expat. I started where most programming endeavours start these days, by googling. It turns out that searching for "[package name] and m4" is a pretty reliable way of finding an Autoconf macro that does (or more or less does) what you want. There's also an extensive collection of over 500 macros in the Autoconf Macro Archive.

Rather than go through the tedious business of writing the macro, I'm going to leap straight to the finished article and highlight the important parts.

m4/has_expat.m4:

AC_DEFUN([ARABICA_HAS_EXPAT],
[
  AC_ARG_WITH([expat],
              [ --with-expat=PREFIX Specify expat library location],
              [],
              [with_expat=yes])

  EXPAT_CFLAGS=
  EXPAT_LIBS=
  if test $with_expat != no; then
    if test $with_expat != yes; then
      expat_possible_path="$with_expat"
    else
      expat_possible_path="/usr /usr/local /opt /var"
    fi
    AC_MSG_CHECKING([for expat headers])
    expat_save_CXXFLAGS="$CXXFLAGS"
    expat_found=no
    for expat_path_tmp in $expat_possible_path ; do
      CXXFLAGS="$CXXFLAGS -I$expat_path_tmp/include"
      AC_COMPILE_IFELSE([@%:@include )],
                        [EXPAT_CFLAGS="-I$expat_path_tmp/include"
                         EXPAT_LIBS="-L$expat_path_tmp/lib"
                         expat_found=yes],
                        [])
      CXXFLAGS="$expat_save_CXXFLAGS"
      if test $expat_found = yes; then
        break;
      fi
    done
    AC_MSG_RESULT($expat_found)
    if test $expat_found = yes; then
      AC_CHECK_LIB([expat],
                   [XML_ParserCreate],
                   [ EXPAT_LIBS="$EXPAT_LIBS -lexpat"
                     expat_found=yes ],
                   [ expat_found=no ],
                   "$EXPAT_LIBS")
      if test $expat_found = yes; then
        HAVE_EXPAT=1
      fi
    fi
  fi
])

It's not that bad is it? Could be worse - modern conveniences like subroutines and variable scoping don't exist in shell script, after all.

This is more or less the canonical form for a macro which looks for a library. First it searches for a known header, then if that's found it tries to link the library, by looking for a known function in that library. If the link succeeds we can declare victory, in this case by setting a flag. When configure reachs the end of script and the expat library was found, HAVE_EXPAT is set and EXPAT_CFLAGS and EXPAT_LIBS point to the header and library locations. If not, HAVE_EXPAT is not set.

The heavy lifting here is provided by the AC_ARG_WITH, AC_COMPILE_IFELSE, and AC_CHECK_LIB macros.

AC_ARG_WITH(package, help-string, [action-if-given], [action-if-not-given]) describes an argument to the configure script. If the user runs configure with --with-package or --without-package options, run shell commands action-if-given. If neither option was given, run shell commands action-if-not-given. The option's argument is available in the shell variable with_package. The --without-package option is equivalent to --with-package=no. In this case, if neither --with-expat or --without-expat is given, I set the with_expat variable myself.

AC_COMPILE_IFELSE(input, [action-if-true], [action-if-false]) compiles a program, running action-if-true if successful, and running action-if-false otherwise. In this case I just want to try and compile

#include <expat.h>

The '#' character is the shell comment character, so I can't use it directly. Autoconf provides a number of quadrigraphs for special characters. The quadrigraph for '#' is the unpronouncable '@%:@'. If you need to check for something more sophisticated than the mere presence of a header, perhaps its presence and contents the AC_LANG_SOURCE and AC_LANG_PROGRAM macros are useful here. AC_COMPILE_IFELSE doesn't try to link.

AC_CHECK_LIB(library, function, [action-if-found], [action-if-not-found]) tests whether a library is available by trying to link a test program that calls function.

When the compiler and linker are invoked, the CXXFLAGS and LIBS variables are used to pass the compiler and linker options. This is why the macro keeps copies of the initial values of these variables, and resets them at the end of the script. In a shell script all variables are global, so care must be taken with special variables like these.

Working from this template, I wrote two further macros to check for libxml2 and Xerces. The Xerces macro is slightly more involved because Xerces is a C++ library. AC_CHECK_LIB plays rather fast and loose with function declarations and is only suitable for checking functions in C libraries. The C++ equivalent is

m4/has_xerces.m4:

...

xerces_save_LIBS="$LDFLAGS"
CXXFLAGS="$CXXFLAGS $XERCES_CFLAGS"
LIBS="$LIBS $XERCES_LIBS -lxerces-c"
AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <xercesc/util/PlatformUtils.hpp>]],
                                [[XERCES_CPP_NAMESPACE::XMLPlatformUtils::Initialize()]])],
               [ XERCES_LIBS="$XERCES_LIBS -lxerces-c"
                 xerces_found=yes],
               [ xerces_found=no])
CXXFLAGS="$xerces_save_CXXFLAGS"
LIBS="$xerces_save_LIBS"

Adding the new macros to configure.ac, preceded by AC_LANG([C++]) to indicated that test programs should be compiled and linked as C++ rather than C gives

configure.ac

AC_INIT([Arabica], [Jan07], [jez@jezuk.co.uk])

AM_INIT_AUTOMAKE

AC_PROG_CXX
AC_PROG_LIBTOOL

AC_LANG([C++])
ARABICA_HAS_EXPAT
ARABICA_HAS_LIBXML2
ARABICA_HAS_XERCES

AC_CONFIG_HEADERS([include/SAX/ArabicaConfig.h])
AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([src/Makefile])
AC_CONFIG_FILES([test/Makefile])
AC_CONFIG_FILES([test/Utils/Makefile])
AC_OUTPUT

Going through a round of autoreconf and ./configure results in

$ ./configure --help
`configure' configures Arabica Jan07 to adapt to many kinds of systems.

Usage: ./configure [OPTION]... [VAR=VALUE]...

...

Optional Packages:
...
  --with-expat=PREFIX     Specify expat library location
  --with-libxml2=PREFIX   Specify libxml2 library location
  --with-xerces=PREFIX    Specify xerces library location
...

$ ./configure
...
checking how to hardcode library paths into programs... immediate
checking for expat headers... yes
checking for XML_ParserCreate in -lexpat... yes
checking for libxml2 headers... no
checking for Xerces headers... yes
checking for XMLPlatformUtils::Initialize in -lxerces-c... yes
configure: creating ./config.status
...
config.status: include/SAX/ArabicaConfig.h is unchanged
config.status: executing depfiles commands
...

The configure script can detect which XML parsers are available. Now to communicate that to the build. I need to do two things; set a preprocessor symbol in ArabicaConfig.h so I can pull in the approprate driver, and have the Arabica library link to the parser library.

My ARABICA_HAS_* macros will have set any or all of HAVE_EXPAT, HAVE_LIBXML2, and HAVE_XERCES, together with a matching pair of variables containing compiler and linker flags. I can write a simple if ladder to determine which outputs to set. But how to set those outputs?

The AC_DEFINE macro adds a symbol to the config header. It has the form AC_DEFINE(variable, value, [description]) and is just what we need.

The compiler and linker flags clearly need to be passed to the compiler and linker. They are invoked by make, so the flags need to be set in the Makefile. Autoconf AC_SUBST(variable, [value]) macro performs variable replacement in the output files, substituting instances of @variable@ with the value. This is the mechanism for getting the flags from the configure script into the Makefiles.

Armed with these two macros, I wrote a further macro to select the parser

m4/select_parser.m4:

AC_DEFUN([ARABICA_HAS_XML_PARSER],
[
  if test "$HAVE_EXPAT" == "1"; then
    AC_DEFINE([USE_EXPAT], ,[define to build against Expat])
    AC_SUBST([PARSER_HEADERS], $EXPAT_CFLAGS)
    AC_SUBST([PARSER_LIBS], $EXPAT_LIBS)
  elif test "$HAVE_LIBXML2" == "1"; then
    ...
  else
    AC_MSG_ERROR([[Cannot find an XML parser library. Arabica needs one of Expat, LibXML2 or Xerces]])
  fi
])

and added it to configure.ac. I updated src/Makefile.am[9] to add placeholders for the compiler and link flags

src/Makefile.am:

...
AM_CPPFLAGS = -I$(top_srcdir)/include @PARSER_HEADERS@
...
libarabica_la_LDFLAGS= @PARSER_LIBS@

And once more around the autoreconf and ./configure loop. Towards the bottom of ArabicaConfig.h we find

include/SAX/ArabicaConfig.h:

...
/* define to build against Expat */
#define USE_EXPAT

Even better than that, it actually builds too.

$ make
Making all in src
make[1]: Entering directory `/home/jez/work/arabica'
if /bin/sh ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I. -I../include/SAX -I../include -I/usr/include -g -O2 -MT arabica.lo -MD -MP -MF ".deps/arabica.Tpo" -c -o arabica.lo arabica.cpp; \
...
/bin/sh ../libtool --tag=CXX --mode=link g++ -g -O2 -o libarabica.la -rpath /usr/local/lib -L/usr/lib -lexpat arabica.lo InputSourceResolver.lo base64codecvt.lo iso88591_utf8.lo ucs2_utf16.lo ucs2_utf8.lo iso88591utf8codecvt.lo rot13codecvt.lo ucs2utf8codecvt.lo utf16beucs2codecvt.lo utf16leucs2codecvt.lo utf16utf8codecvt.lo utf8iso88591codecvt.lo utf8ucs2codecvt.lo XMLCharacterClasses.lo
...
ranlib .libs/libarabica.a
creating libarabica.la
(cd .libs && rm -f libarabica.la && ln -s ../libarabica.la libarabica.la)
make[1]: Leaving directory `/home/jez/work/arabica'

Readers with eidetic memories will spot that the long list of -D options passed to the compiler have gone[10], replaced by the ArabicaConfig.h header. Sharpeyed readers will also spot the -lexpat in the linker options, the result of the variable of the AC_SUBST macro.

Customising the Build - No Boost

The AC_DEFINE and AC_SUBST macros are the two most common ways to customise and configure your build, but sometimes they can't quite get you where you want to be.

Arabica consists of several different pieces, which stack on one another. At the bottom is the a SAX layer, wrapping whichever library configure finds. On top of that is a DOM implementation, and on that is an XPath engine. These different layers also have different dependencies. The XPath engine uses Boost, while the other pieces don't. My configure script should check for Boost, as XPath can't be built with out it. Its absence isn't completely critical though, since the other pieces can be built. In a case like this, a preprocessor #define, environment variable or text substitution isn't really going help. It just isn't enough. What we need to be able to say is if this condition applies, recurse the build into these subdirectories, or build these executables. Automake's AM_CONDITIONAL macro allows us to do exactly that.

At the end of my macro which checks for Boost[11], I have

m4/ax_boost_base.m4:

...
AM_CONDITIONAL([WANT_XPATH], [test "$want_xpath" = "yes"])

and in tests/Makefile.am I have

tests/Makefile.am:

SUBDIRS = Utils SAX DOM
if WANT_XPATH
  SUBDIRS += XPath
endif
...

Now, the build will only walk down into the XPath directory if the Boost libraries are found.

Build Targets for Free - Install

In addition to the build targets you specify Autotools provides a number of additional targets. Among the most useful are install, dist and dist-check.

What is installed where are controlled by the Makefile.am primaries[12]. A file named in a primary is installed by copying the built file into the appropriate directory. So

bin_PROGRAMS = hello
would be installed in $(bindir). By default configure $(bindir) is /usr/local/bin, but that can be changed with a configure command line parameter. Autotools can also install libraries and headers. Where the platform requires, the install target will relink and take care of any other platform specific jiggery-pokery.

Arabica is implemented mainly in header files (over 150 at time of writing), and I'm too pragmatic (or lazy, take your pick) to try and keep Makefile.ams up to date with a constantly changing list of files. Consequently, the built-in install target won't install my headers, because it doesn't know about them. Autoconf accounts for situations like this by providing hook points in the built-in targets. For Arabica, I can provide a install-data-local target in my Makefile.am, and it gets called at the right point in the install.

Makefile.am:

...

install-data-local:
      @echo "------------------------------------------------------------"
      @echo "Installing include files to $(includedir)"
      @echo "------------------------------------------------------------"
      for inc in `cd $(srcdir)/include && find . -type f -print | grep -v \.svn`; \
        do $(INSTALL_HEADER) -D "$(srcdir)/include/$$inc" "$(includedir)/$$inc"; \
      done

This is a straightforward Makefile fragment. It finds all the header files (skipping Subversion directories) and copy them into $(includedir). Autotools arranges for $(includedir) to be pointing to the correct location, typically /usr/local/include.

Autoconf also provides a matching uninstall target. Politeness dictates that if you use the install hook, you should also provide an uninstall hook in your Makefile.am.

Makefile.am:

...

uninstall-local:
      @echo "------------------------------------------------------------"
      @echo "Removing include files from $(includedir)"
      @echo "------------------------------------------------------------"
      for inc in `cd $(srcdir)/include && find . -type f -print | grep -v \.svn`; \
        do rm -rf "$(includedir)/$$inc"; \
      done
      for dir in `cd $(srcdir)/include && find . -type d -print | grep -v \.svn`; \
        do rm -rf "$(includedir)/$$dir"; \
      done

Autoconf includes a number of other built-in targets, including rules for running tests, maintaining file dependencies, and packaging source files. They all provide similar hook points.

Adding A Custom Target

In addition to the built-in targets and their hook points, it's also possible to add your own build targets. You simply add the target and its rules to Makefile.am. Arabica includes a target to build HTML class documentation.

Makefile.am:

...

docs:
      doxygen doc/arabica.dox
      @echo "------------------------------------------------------------"
      @echo "Generated documents to ./doc/html"
      @echo "------------------------------------------------------------"

Expected Benefits

My decision to move to a new build system was driven by the fact that my pile of Makefiles had become unworkable. They were increasingly difficult for me to maintain, and difficult for people to use. I chose Autotools expecting it to be able to

In short, it can. In the course of this article I've outlined macros which identify Arabica's prerequisites. Writing macros is straightforward, if indeed a quick Google doesn't find one for you. Autotools handles file extensions automatically, along with lots of other platform specific details I hadn't even considered.

Since Arabica is mainly implemented in header files, as a developer I was particular keen to have dependencies tracked automatically. The generated Makefiles track dependencies extremely well. In the time I've been using Autotools I've never been caught with a bad build.

Although there was a bit of ramp to up to using Autotools, now that I have the relationshop between configure.ac, configure, Makefile.ams and Makefiles clear in my head, I've found using it extremely easy. Adding new executables to the build takes only a few minutes. Modifying the build to include or exclude certain pieces I've also found to be straightforward to implement. Since build options can be exposed as configure script options, they are much easier for Arabica users to access. They don't need to fish around in a Makefile, they can just pass an option to configure.

My experience with building Arabica on different platforms, and the reports I've had, tell me that a package that uses Autotools stands an extremely good chance of building on some arbitrary machine. Importantly, the configure script allows problems to be identified and reported before we even attempt to compile a link. A message which says Can not find an XML parser library. Arabica nees one of Expat, LibXML2 or Xerces is much clearer than a screen full of compiler errors caused by a missing header file, or unresolved link error.

For all the criteria I set myself, switching to Autotools has been a success.

Unexpected Benefits

Since my initial release of an autoconfiscated Arabica package in September 2006, I've discovered a number of other benefits which I either hadn't expected or even considered.

One thing I hadn't expected is that I'm finding Makefile.am files much easier to maintain that Makefiles. Arabica is under relatively energetic development, and so I'm adding new things to the build reasonable often. Despite years of working with Makefiles, I could rarely get one right first time. Makefile.ams are much more concise, and I get them right more often than not. Once I've added a new source file, say, Autotools also takes care of dependency tracking, installation, and source file packaging for me. I save time both because updating the Makefile.am is easier, and because it does more work for me.

As someone with a history of producing not-quite-correct tarballs, I've found the built-in dist and dist-check targets to be invaluable. The dist target creates source tar.gz, tar.bz2 and zip files, using the Makefile dependencies. Dist-check provides extra peace of mind. It bundles up the source files, then unbundles them into a temporary directory and tries to build the package. I love it.

According to Sourceforge's statistics, Arabica is now getting more downloads. A new release always did bring a spike in traffic, but I do seem to be seeing a sustained increase in the number of downloads. I don't know who or where most these new downloaders are or what they're doing, but that's part of the fun.

Autotools makes cross-compilation straightforward. People are building Arabica for embedded platforms, with some success. I'm pretty sure my previous system, if not active hostile to cross-compilation, didn't make things any easier.

When build fails, the emails I have had show that people tend to blame themselves. They write emails containing phrases like "I'm sure it something I've done", and "if you could point me in the right direction". Prior to autoconfiscating Arabica, I received perhaps five emails in six years about getting the package build on new platforms. Subsequent to autoconfiscating Arabica, I've received six in the last six months. What's more, in every case but one getting the build going has been straightforward even without my having access to the platform in question.

What this shows, I think, is that people find the ./configure; make; make install incantation comforting. It sends a message that the package author knows what they're doing, and that sends good messages about the package itself. Arabica's initial impressions were good, rather than off-putting. A reliable build means people can concentrate on finding out whether Arabica can actually help them do what they want. If the build doesn't work for whatever reason, people feel it can be fixed, simply because they see Autoconf build working so often.

So was it worth it? Would I do it again?

Yes and yes, although I wouldn't necessarily recommend it universally or unequivocally. For existing systems that work already, I wouldn't recommend you change simply for the sake of changing. For systems where portability isn't a consideration, or where the build is straightforward, I wouldn't necessarily consider Autotools as my only choice.

For something like Arabica, code intended to be built on a number of different platforms, then I think I would now reach for Autotools first. For systems where the build is fluid, where things are coming in and out of the build often, I'd consider Autotools because I've found it easy to maintain. In cases where the build needs to be customisable, for whatever reason (missing header file, basic vs full options , etc), I'd also consider Autotools a strong candidate.

And it's Goodnight from me

And that's more or less it. In the course of these three articles, I've sprinted through my migration to Autoconf. I've discussed what Autoconf is and what it does. I've given examples of common Autoconf operations. Finally this month, I've looked at examples of the various ways to modify the build through preprocess symbols, variable substitutions in Makefiles, conditional targets, and by customising built-in targets.

My experience with Autotools has been, and continues to be good. Should you choose Autotools in the future, hopefully these articles will help you have a similar experience.


Jez Higgins