Article originally published in CVu 18-6. CVu is the journal of ACCU.
Arabica is an XML toolkit written in C++. It provides a SAX interface for streaming XML parsing, a DOM interface for in-memory XML processing, and an XPath engine for easy DOM access. In the next release or two, it will add an XSLT processor. Arabica supports
std::wstring or pretty much any other crazy string class. The code itself is good, honest, standard C++, which my experience shows is highly portable. I've built Arabica on Windows using Visual C++, under Cygwin, on a variety of Linux flavours, FreeBSD, several types of Solaris, OS X, and GNU Darwin. It's quite a tidy package, and if you're working with XML in C++ you should consider it. That's what I think, anyway, but I did write.
Damon, in Montreal, disagrees. On 22 February 2005 he wrote
"The C++ port of SAX (a set of standard JAVA API for XML parsing) Arabica is totally unusable, there are even syntax errors like namespace errors in the socalled stable release, besides it does not have any reference manual!"
"It is an awful library with a bunch of syntax errors in its newest release (namespace error....), no documents available at all. Fail to compile it on Linux at all."
I didn't have any correspondence with Damon. I found his comments months later during a vanity searching session on Technorati. In the nearly 8 years since Arabica's initial release, it's been my experience that people very rarely write to you about your software. When they download your code, it either works or it doesn't. If it works, they got on with what they were doing. If it doesn't, they may take a moment to think you're an idiot, then they fling it away and try something else. More often than not, that first point of failure is the build. If it doesn't build, it's fallen at the very first hurdle.
The Arabica distribution contains, currently, around 150 source files. Since Arabica is largely implemented as C++ templates, the majority of the files don't need to be compiled and build seperately. You just include them into your code. Only a handful, under 20, need to be build into a shared or static library.
Why did Damon have such a hard time? Why didn't I?
Come with me. Come with me on a journey through time.
When I started on the code that would become Arabica, I was an angry man. I was having a very bad experience at work, with an rotton developer, who had handed over an horrible, verbose, bug-ridden piece of code. It was, allegedly, an XML parser. It read an XML wire-format and built a C++ object graph, what we now call deserialisation. At the time, I called it rubbish. (I know everybody has "he was awful" war stories, but this was, genuinely, one of the worst experiences of my working life. I'm getting angry just thinking about it again.)
I had argued that we shouldn't, absolutely shouldn't, build our own parser, but use one of free parsers that were, even then, already available. When I had this lump of code dropped on me, I wanted to demonstrate just how awful it was. I grabbed the Expat source and got the build going on Windows. Next, I grabbed the recently released Java SAX interfaces, and ran though them search and replacing
std::string. (SAX describes a streaming XML parser interface. It was initially developed provide a common interface to XML parsers written in Java (as JDBC provides a common interface to databases), but there are now implementations in most languages.) That done, I hooked up Expat, which is a C library that deals in
char*, to my new SAX classes. It worked. No bugs. Not bad for an afternoon's work. I released the code as an afterthought. I didn't think it was of particular interest, but the code I'd based it on was freely available, and I needed something to put on my website.
Over the subsequent months, and then years, I continued to work on Arabica on and off. There was a new version of SAX, which I incorporated. Other C and C++ XML parsers were released and I wrote SAX wrappers for them.
For most of that time, my primary development platform was Visual C++ 6 and then 7. Every now and again, I'd boot up a Linux box, refresh the
Makefiles, and clean up the conformance errors GCC pointed out. It worked ok, after a fashion.
As the library grew, the build became increasingly fiddly. While Arabica provided bindings for Expat, libxml2, Xerces, and MSXML, you'd only want to build against one of those. That implies a certain amount of
Makefile editing. I found out that some compiler/operating system combinations didn't support
std::wstring, so parts of the build had to be conditionally excluded. C++ libraries have different levels of Standards conformance, and there are ambiguities in some places, so parts of my code have to be conditionally included to plug the gaps. Some platforms put things in different places, or expect certain types of files to have certain extensions, which needs more
Makefile editting. Shared libraries, for example, generally have .so extensions. Under Cygwin, however, uses .dll, while OS X and other Darwin derivatives use .dylib.
./configureand be done with it. Unfortunately, at the moment the dark recesses of template meta-programming are as nothing to getting autoconf going. One day ... So anyway, we have to resort to a little
Makefilefiddling. What I'm going to describe is probably GNU Make specific, but for other Make variants you should be able to follow along ok.
Makefilewhich builds everything. It uses the -include directive to pull in
Makefile.header, which is where all the twiddly bits are.
Makefile.headerin your favourite editor. Most of it should be pretty obvious - defining
CXXto point to your C++ compiler and so on. There are some examples in the distribution you can use as a base.
PARSER_CONFIGcontrols which parsers Arabica will use, and also whether to compile in wide character support. For each parser you want to configure as
-DUSE_parser. The choices are
USE_XERCES. If you don't need, or your platform+compiler doesn't support wide characters (eg. Cygwin, gcc on Solaris) you'll also need to set
-DARABICA_NO_WCHAR_T. For each parser you support, add the appropriate
./bin. If your parser's header files aren't installed in the usual places (
/usr/local/includeor whatever the default is for your platform), you'll have to edit
Makefile. Once libSAX is built, everything else should build too.
Makefiles work for me on using gcc on Suse Linux 7.3, Cygwin and Solaris 7. If you can supply a
Makefile.headerfor a new platform+compiler, I'd be delighted to receive it.
You can see I had made some effort to ease this process. GNU Make supports an include mechanism, so I had moved all the platform specific pieces out into a separate
Makefile fragment. This minimized the number of places that needed to be edited, but there was still a deal of manual intervention required. I supplied a number of platform specific versions, 6 at last count, but I didn't have regular access to all of the platforms in question. Note also the equivocation - "that should be it", "will probably work", "works for me". It wasn't reliable, and I knew it.
It was a maintainance bother too. As I added more test and example programs, I had more
Makefiles to maintain. When I added the XPath engine, which uses Boost Spirit, I received emails from people who didn't need XPath asking me how to leave out it out of the build, as their builds were now broken.
I had this code, code I knew was good and portable and useful, but I had this cruddy, wobbly, unreliable build system that had accreted around the outside. It was awkward for me, off-putting for other people. At least one person thought I was a useless idiot. Something had to change.
Out with Makefiles
I needed an alternative to my motley collection of
Makefile bits and pieces. At the very least it had to meet the following criteria
There are now many alternatives to Make. There's Ant, its Groovy derivative Gant, and its .NET-alike Nant. There's Cons and Scons. There's Jam and BJam. There's Rake and A-A-P and a whole host more I'd not even heard of. If you look at any of these tools, chances are there's at least a passing reference to how much better than Make it is.
I didn't consider any of them, not even for a moment.
If you download some arbitrary program or library written in C or C++, from Sourceforge, Tigris, Savannah, or whereever, it won't, as rule, use any of those tools. Chances are pretty good that it won't need anything like the fiddling the Arabica did. You expect something like this :
$ wget http://somewhere/path/to/somelib.tar.gz
$ tar zxf somelib.tar.gz
$ cd somelib
[... lots of output snipped ...]
[... lots more output snipped ...]
$ make install
[... a little bit more output, also snipped ...]
In With Makefile.am
The magic of the
./configure; make; make install is provided by GNU Autotools. Autotools is actually three separate packages - autoconf, automake, and libtool. Autoconf create portable and configurable packages, the
configure script. Automake is a
Makefile generator, used with autoconf to produce
Makefiles based on what
configure finds out about the system. Libtool is a set of shell scripts to build shared libraries in a generic fashion. In reality, you don't use one without the others. As far as I can tell Autotools is not an official name, but everybody knows that it means.
You might have noticed a disparaging reference to configure in the build notes above. I've actually been here before. Six years ago, I attempted to convert Arabica as was to use Autotools. Even armed with a hot off the press copy of New Riders' GNU Autoconf, Automake, and Libtool - a book written by the primary Autotools maintainers - I made absolutely no progress at all. I found the whole process so dispiriting and confusing that I abandoned my efforts, subsequently consigning myself to years of creaking
Makefiles and the contempt of Damon from Montreal.
Six years is a long time in programming. Despite my previous bad experience, I had no doubts that I would, in relatively short order, autoconfiscate my project.