Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations
Article originally published in CVu 18-6. CVu is the journal of ACCU.
Arabica is an XML toolkit written in C++. It provides a SAX
interface for streaming XML parsing, a DOM interface for in-memory XML
processing, and an XPath engine for easy DOM access. In the next
release or two, it will add an XSLT processor. Arabica supports
std::string
, std::wstring
or pretty much any other crazy string class.
The code itself is good, honest, standard C++, which my experience
shows is highly portable. I've built Arabica on Windows using Visual
C++, under Cygwin, on a variety of Linux flavours, FreeBSD, several
types of Solaris, OS X, and GNU Darwin. It's quite a tidy package,
and if you're working with XML in C++ you should consider it. That's
what I think, anyway, but I did write.
Damon, in Montreal, disagrees. On 22 February 2005 he wrote
"The C++ port of SAX (a set of standard JAVA API for XML parsing) Arabica is totally unusable, there are even syntax errors like namespace errors in the socalled stable release, besides it does not have any reference manual!"
"It is an awful library with a bunch of syntax errors in its newest release (namespace error....), no documents available at all. Fail to compile it on Linux at all."
I didn't have any correspondence with Damon. I found his comments months later during a vanity searching session on Technorati. In the nearly 8 years since Arabica's initial release, it's been my experience that people very rarely write to you about your software. When they download your code, it either works or it doesn't. If it works, they got on with what they were doing. If it doesn't, they may take a moment to think you're an idiot, then they fling it away and try something else. More often than not, that first point of failure is the build. If it doesn't build, it's fallen at the very first hurdle.
The Arabica distribution contains, currently, around 150 source files. Since Arabica is largely implemented as C++ templates, the majority of the files don't need to be compiled and build seperately. You just include them into your code. Only a handful, under 20, need to be build into a shared or static library.
Why did Damon have such a hard time? Why didn't I?
Come with me. Come with me on a journey through time.
When I started on the code that would become Arabica, I was an angry man. I was having a very bad experience at work, with an rotton developer, who had handed over an horrible, verbose, bug-ridden piece of code. It was, allegedly, an XML parser. It read an XML wire-format and built a C++ object graph, what we now call deserialisation. At the time, I called it rubbish. (I know everybody has "he was awful" war stories, but this was, genuinely, one of the worst experiences of my working life. I'm getting angry just thinking about it again.)
I had argued that we shouldn't, absolutely shouldn't, build our own
parser, but use one of free parsers that were, even then, already
available. When I had this lump of code dropped on me, I wanted to
demonstrate just how awful it was. I grabbed the Expat source and
got the build going on Windows. Next, I grabbed the recently released
Java SAX interfaces, and ran though them search and replacing
String
for std::string
. (SAX describes a streaming XML parser
interface. It was initially developed provide a common interface to
XML parsers written in Java (as JDBC provides a common interface to
databases), but there are now implementations in most languages.) That done, I hooked up Expat, which is a
C library that deals in char*
, to my new SAX classes. It worked. No
bugs. Not bad for an afternoon's work. I released the code as an
afterthought. I didn't think it was of particular interest, but the
code I'd based it on was freely available, and I needed something to
put on my website.
Over the subsequent months, and then years, I continued to work on Arabica on and off. There was a new version of SAX, which I incorporated. Other C and C++ XML parsers were released and I wrote SAX wrappers for them.
For most of that time, my primary development platform was Visual C++
6 and then 7. Every now and again, I'd boot up a Linux box, refresh
the Makefile
s, and clean up the conformance errors GCC pointed out.
It worked ok, after a fashion.
As the library grew, the build became increasingly fiddly. While
Arabica provided bindings for Expat, libxml2, Xerces, and MSXML, you'd
only want to build against one of those. That implies a certain
amount of Makefile
editing. I found out that some compiler/operating
system combinations didn't support std::wstring
, so parts of the build
had to be conditionally excluded. C++ libraries have different levels
of Standards conformance, and there are ambiguities in some places, so
parts of my code have to be conditionally included to plug the gaps.
Some platforms put things in different places, or expect certain types
of files to have certain extensions, which needs more Makefile
editting. Shared libraries, for example, generally have .so
extensions. Under Cygwin, however, uses .dll, while OS X and other
Darwin derivatives use .dylib.
At the time Damon was discarding Arabica as completely unusable, my build notes were
./configure
and be done with
it. Unfortunately, at the moment the dark recesses of template
meta-programming are as nothing to getting autoconf going. One day
... So anyway, we have to resort to a little Makefile
fiddling. What
I'm going to describe is probably GNU Make specific, but for other
Make variants you should be able to follow along ok.Makefile
which builds
everything. It uses the -include directive to pull in
Makefile.header
, which is where all the twiddly bits are.Makefile.header
in your favourite editor. Most of it should
be pretty obvious - defining CXX
to point to your C++ compiler and
so on. There are some examples in the distribution you can use as a
base. Makefile.header
macro is
PARSER_CONFIG
. PARSER_CONFIG
controls which parsers Arabica will
use, and also whether to compile in wide character support. For each
parser you want to configure as -DUSE_parser
. The choices are
USE_EXPAT
, USE_LIBXML2
and USE_XERCES
. If you don't need, or your
platform+compiler doesn't support wide characters (eg. Cygwin, gcc
on Solaris) you'll also need to set -DARABICA_NO_WCHAR_T
. For each
parser you support, add the appropriate -lwhatever
(-lexpat
,
-lxerces-c
, -lxml2
) to DYNAMIC_LIBS
../bin
. If your parser's
header files aren't installed in the usual places (/usr/include
,
/usr/local/include
or whatever the default is for your platform),
you'll have to edit INCS_DIRS
in the Makefile
. Once libSAX is built,
everything else should build too.Makefile
s work for me on using gcc on Suse Linux 7.3,
Cygwin and Solaris 7. If you can supply a Makefile.header
for a new
platform+compiler, I'd be delighted to receive it.You can see I had made some effort to ease this process. GNU Make
supports an include mechanism, so I had moved all the platform
specific pieces out into a separate Makefile
fragment. This minimized
the number of places that needed to be edited, but there was still a
deal of manual intervention required. I supplied a number of platform
specific versions, 6 at last count, but I didn't have regular access
to all of the platforms in question. Note also the equivocation -
"that should be it", "will probably work", "works for me". It wasn't
reliable, and I knew it.
It was a maintainance bother too. As I added more test and example
programs, I had more Makefile
s to maintain. When I added the XPath
engine, which uses Boost Spirit, I received emails from people who
didn't need XPath asking me how to leave out it out of the build, as
their builds were now broken.
I had this code, code I knew was good and portable and useful, but I had this cruddy, wobbly, unreliable build system that had accreted around the outside. It was awkward for me, off-putting for other people. At least one person thought I was a useless idiot. Something had to change.
Out with Makefiles
I needed an alternative to my motley collection of Makefile
bits and
pieces. At the very least it had to meet the following criteria
std::wstring
was supportedThere are now many alternatives to Make. There's Ant, its Groovy derivative Gant, and its .NET-alike Nant. There's Cons and Scons. There's Jam and BJam. There's Rake and A-A-P and a whole host more I'd not even heard of. If you look at any of these tools, chances are there's at least a passing reference to how much better than Make it is.
I didn't consider any of them, not even for a moment.
If you download some arbitrary program or library written in C or C++, from Sourceforge, Tigris, Savannah, or whereever, it won't, as rule, use any of those tools. Chances are pretty good that it won't need anything like the fiddling the Arabica did. You expect something like this :
$ wget http://somewhere/path/to/somelib.tar.gz
$ tar zxf somelib.tar.gz
$ cd somelib
$ ./configure
[... lots of output snipped ...]
$ make
[... lots more output snipped ...]
$ make install
[... a little bit more output, also snipped ...]
In With Makefile.am
The magic of the ./configure; make; make install
is provided by GNU
Autotools. Autotools is actually three separate packages - autoconf,
automake, and libtool. Autoconf create portable and configurable
packages, the configure
script. Automake is a Makefile
generator,
used with autoconf to produce Makefile
s based on what configure
finds
out about the system. Libtool is a set of shell scripts to build
shared libraries in a generic fashion. In reality, you don't use one
without the others. As far as I can tell Autotools is not an official
name, but everybody knows that it means.
You might have noticed a disparaging reference to configure in the
build notes above. I've actually been here before. Six years ago, I
attempted to convert Arabica as was to use Autotools. Even armed with
a hot off the press copy of New Riders' GNU Autoconf, Automake, and
Libtool - a book written by the primary Autotools maintainers - I
made absolutely no progress at all. I found the whole process so
dispiriting and confusing that I abandoned my efforts, subsequently
consigning myself to years of creaking Makefile
s and the contempt of
Damon from Montreal.
Six years is a long time in programming. Despite my previous bad experience, I had no doubts that I would, in relatively short order, autoconfiscate my project.
In part two, I tell you how I did it.
Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations