Adventures in Autoconfiscation - Part One of Three
Arabica is an XML toolkit written in C++. It provides a SAX
interface for streaming XML parsing, a DOM interface for in-memory XML
processing, and an XPath engine for easy DOM access. In the next
release or two, it will add an XSLT processor. Arabica supports
std::wstring or pretty much any other crazy string class.
The code itself is good, honest, standard C++, which my experience
shows is highly portable. I've built Arabica on Windows using Visual
C++, under Cygwin, on a variety of Linux flavours, FreeBSD, several
types of Solaris, OS X, and GNU Darwin. It's quite a tidy package,
and if you're working with XML in C++ you should consider it. That's
what I think, anyway, but I did write.
Damon, in Montreal, disagrees. On 22 February 2005 he wrote
"The C++ port of SAX (a set of standard JAVA API for XML parsing) Arabica is totally unusable, there are even syntax errors like namespace errors in the socalled stable release, besides it does not have any reference manual!"
"It is an awful library with a bunch of syntax errors in its newest release (namespace error....), no documents available at all. Fail to compile it on Linux at all."
I didn't have any correspondence with Damon. I found his comments months later during a vanity searching session on Technorati. In the nearly 8 years since Arabica's initial release, it's been my experience that people very rarely write to you about your software. When they download your code, it either works or it doesn't. If it works, they got on with what they were doing. If it doesn't, they may take a moment to think you're an idiot, then they fling it away and try something else. More often than not, that first point of failure is the build. If it doesn't build, it's fallen at the very first hurdle.
The Arabica distribution contains, currently, around 150 source files. Since Arabica is largely implemented as C++ templates, the majority of the files don't need to be compiled and build seperately. You just include them into your code. Only a handful, under 20, need to be build into a shared or static library.
Why did Damon have such a hard time? Why didn't I?
Come with me. Come with me on a journey through time.
When I started on the code that would become Arabica, I was an angry man. I was having a very bad experience at work, with an rotton developer, who had handed over an horrible, verbose, bug-ridden piece of code. It was, allegedly, an XML parser. It read an XML wire-format and built a C++ object graph, what we now call deserialisation. At the time, I called it rubbish. (I know everybody has "he was awful" war stories, but this was, genuinely, one of the worst experiences of my working life. I'm getting angry just thinking about it again.)
I had argued that we shouldn't, absolutely shouldn't, build our own
parser, but use one of free parsers that were, even then, already
available. When I had this lump of code dropped on me, I wanted to
demonstrate just how awful it was. I grabbed the Expat source and
got the build going on Windows. Next, I grabbed the recently released
Java SAX interfaces, and ran though them search and replacing
std::string. (SAX describes a streaming XML parser
interface. It was initially developed provide a common interface to
XML parsers written in Java (as JDBC provides a common interface to
databases), but there are now implementations in most languages.) That done, I hooked up Expat, which is a
C library that deals in
char*, to my new SAX classes. It worked. No
bugs. Not bad for an afternoon's work. I released the code as an
afterthought. I didn't think it was of particular interest, but the
code I'd based it on was freely available, and I needed something to
put on my website.
Over the subsequent months, and then years, I continued to work on Arabica on and off. There was a new version of SAX, which I incorporated. Other C and C++ XML parsers were released and I wrote SAX wrappers for them.
For most of that time, my primary development platform was Visual C++
6 and then 7. Every now and again, I'd boot up a Linux box, refresh
Makefiles, and clean up the conformance errors GCC pointed out.
It worked ok, after a fashion.
As the library grew, the build became increasingly fiddly. While
Arabica provided bindings for Expat, libxml2, Xerces, and MSXML, you'd
only want to build against one of those. That implies a certain
Makefile editing. I found out that some compiler/operating
system combinations didn't support
std::wstring, so parts of the build
had to be conditionally excluded. C++ libraries have different levels
of Standards conformance, and there are ambiguities in some places, so
parts of my code have to be conditionally included to plug the gaps.
Some platforms put things in different places, or expect certain types
of files to have certain extensions, which needs more
editting. Shared libraries, for example, generally have .so
extensions. Under Cygwin, however, uses .dll, while OS X and other
Darwin derivatives use .dylib.
- Building Arabica isn't hard, but it can be a little fiddly.
- First, you will need to have at least one of the following parsers installed - expat, libxml, Xerces. If you're working on a Linux box, you probably have libxml or expat already installed. It's entirely possible to build in support for several parsers, but you'll probably only want one.
- Next you need to build the SAX library, configuring it for your choice of parser, or parsers.
- In an ideal world you'd just do
./configureand be done with it. Unfortunately, at the moment the dark recesses of template meta-programming are as nothing to getting autoconf going. One day ... So anyway, we have to resort to a little
Makefilefiddling. What I'm going to describe is probably GNU Make specific, but for other Make variants you should be able to follow along ok.
- Choose your parser (or parsers) as detailed above.
- You'll need a relatively Standards compliant C++ compiler and library - gcc 3.x.y is okay, gcc 2.95.* will probably work if you use an alternative library such as STLPort.
- Untar the Arabica source.
- At the top level directory, you'll find a
Makefilewhich builds everything. It uses the -include directive to pull in
Makefile.header, which is where all the twiddly bits are.
- Pull up
Makefile.headerin your favourite editor. Most of it should be pretty obvious - defining
CXXto point to your C++ compiler and so on. There are some examples in the distribution you can use as a base.
- The interesting
PARSER_CONFIGcontrols which parsers Arabica will use, and also whether to compile in wide character support. For each parser you want to configure as
-DUSE_parser. The choices are
USE_XERCES. If you don't need, or your platform+compiler doesn't support wide characters (eg. Cygwin, gcc on Solaris) you'll also need to set
-DARABICA_NO_WCHAR_T. For each parser you support, add the appropriate
- Run make. libSAX should build, possibly with a number of warnings
about preprocessor tokens, and finish up in
./bin. If your parser's header files aren't installed in the usual places (
/usr/local/includeor whatever the default is for your platform), you'll have to edit
Makefile. Once libSAX is built, everything else should build too.
- The supplied
Makefiles work for me on using gcc on Suse Linux 7.3, Cygwin and Solaris 7. If you can supply a
Makefile.headerfor a new platform+compiler, I'd be delighted to receive it.
- Once the SAX library is built, the DOM library is simplicity itself. You don't have to do anything! Arabica's DOM implementation is all headers files. If you want to use it, just include the appropriate parts, link the SAX library, and you're done.
You can see I had made some effort to ease this process. GNU Make
supports an include mechanism, so I had moved all the platform
specific pieces out into a separate
Makefile fragment. This minimized
the number of places that needed to be edited, but there was still a
deal of manual intervention required. I supplied a number of platform
specific versions, 6 at last count, but I didn't have regular access
to all of the platforms in question. Note also the equivocation -
"that should be it", "will probably work", "works for me". It wasn't
reliable, and I knew it.
It was a maintainance bother too. As I added more test and example
programs, I had more
Makefiles to maintain. When I added the XPath
engine, which uses Boost Spirit, I received emails from people who
didn't need XPath asking me how to leave out it out of the build, as
their builds were now broken.
I had this code, code I knew was good and portable and useful, but I had this cruddy, wobbly, unreliable build system that had accreted around the outside. It was awkward for me, off-putting for other people. At least one person thought I was a useless idiot. Something had to change.
Out with Makefiles
I needed an alternative to my motley collection of
Makefile bits and
pieces. At the very least it had to meet the following criteria
- be able to find Arabica's prerequisites - at least an XML parser and optionally Boost
- identify whether
- detect platform specific file extensions
- track file dependencies
- be at least as easy to maintain as my existing setup
- stand a better than even chance of working on the random machine that somebody has just downloaded my code to
There are now many alternatives to Make. There's Ant, its Groovy derivative Gant, and its .NET-alike Nant. There's Cons and Scons. There's Jam and BJam. There's Rake and A-A-P and a whole host more I'd not even heard of. If you look at any of these tools, chances are there's at least a passing reference to how much better than Make it is.
I didn't consider any of them, not even for a moment.
If you download some arbitrary program or library written in C or C++, from Sourceforge, Tigris, Savannah, or whereever, it won't, as rule, use any of those tools. Chances are pretty good that it won't need anything like the fiddling the Arabica did. You expect something like this :
$ wget http://somewhere/path/to/somelib.tar.gz
$ tar zxf somelib.tar.gz
$ cd somelib
[... lots of output snipped ...]
[... lots more output snipped ...]
$ make install
[... a little bit more output, also snipped ...]
In With Makefile.am
The magic of the
./configure; make; make install is provided by GNU
Autotools. Autotools is actually three separate packages - autoconf,
automake, and libtool. Autoconf create portable and configurable
configure script. Automake is a
used with autoconf to produce
Makefiles based on what
out about the system. Libtool is a set of shell scripts to build
shared libraries in a generic fashion. In reality, you don't use one
without the others. As far as I can tell Autotools is not an official
name, but everybody knows that it means.
You might have noticed a disparaging reference to configure in the
build notes above. I've actually been here before. Six years ago, I
attempted to convert Arabica as was to use Autotools. Even armed with
a hot off the press copy of New Riders' GNU Autoconf, Automake, and
Libtool - a book written by the primary Autotools maintainers - I
made absolutely no progress at all. I found the whole process so
dispiriting and confusing that I abandoned my efforts, subsequently
consigning myself to years of creaking
Makefiles and the contempt of
Damon from Montreal.
Six years is a long time in programming. Despite my previous bad experience, I had no doubts that I would, in relatively short order, autoconfiscate my project.