Arabica is an XML toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.

Arabica is available for download under a BSD-style license.

Latest News

[RSS 0.91]
Wednesday 06 August, 2008
#Arabica: impending release

Now my latest gentle stroll has concluded, there are one or two platform specific build issues to resolve. With them done, I expect to be dropping a new release around the end of August or start of September. The release will include the Taggle HTML parser and improved XSLT support, along with various little bug fixes, minor build improvements.

If you can't wait, there's always the subversion repository.
[Add a comment]

#XSLT: Variable resolution

After a bit of break, I've spent time hacking on Arabica again, which has been lovely. It's really rather relaxing to just nurdle around in your own code, without any particular pressure or need. My normal way of working on Arabica's XSLT processor is to run some of the test suite, pick a failing case, and fix it. If I can get a few more tests passing in half an hour or an hour, and I generally can, then that's a little step further along.

In this latest little bit of activity, I've been focussing on variables and variable resolution. I've fixed various problem with circular references, scoping, namespace resolution, and what I thought was going to be a thorny problem with import precedence.

What constantly surprises me is how straightforward most of these problems are, requiring only a few lines of code. In fact this has been the story of Arabica's XSLT development. Once the initial development push was done, almost all the rest has been a few lines here, a few lines there. I've been working away on this now for coming up three years, on and off and with digressions, and have no idea when I'll be done, but I that doesn't bother me at all. It's like an old pair of slippers, or favourite woolly jumper. It's a comfortable, gentle thing to slip into and go for a stroll in every now and again.


[Add a comment]

Wednesday 28 May, 2008
#Visual Studio, how I curse your useless warning C4800

'type' : forcing value to bool 'true' or 'false' (performance warning)
Performance warning, my arse.


[Add a comment]

Friday 18 April, 2008
#XSLT: Implementing position matches[2]

Revisiting position matches at the moment. I've described how position matches need to be written, and the code works and works well.

Except when it doesn't. It actually fails for the less common cases, and it took me a little while to work out why.

Here's the pattern from the test case that showed the problem

foo[@att1='c'][2]
It wants to match the second node in the set of foo elements with an att1 attribute containing 'c'.

Arabica rewriting finds the positional predicate and applies its incorrect magic. The rewritten pattern is equivalent to

foo[2][@att1='c']
which picks out the second foo node if it has an att1 attribute containing 'c'. The difference isn't immediately clear, even when you have the both the incorrect output and the expected output sitting in front of you.

My small crumb of comfort is that if you do want foo[2][@att1='c'], Arabica does do the right thing. That gave me the clue. Arabica implements XSLT match patterns by rewriting them as XPath expressions.

foo[2][@att1='c']
is rewritten as an XPath along the lines of
self::foo[. = parent::*/foo[2]][@att1='c']

My faulty rewriting of

foo[@att1='c'][2]
was
self::foo[@att1='c'][. = parent::*/foo[2]]
which you should be able to see is logically identical to the above. What I need is
self::foo[. = parent::*/foo[@att='c'][2]]
I had to work quite hard to see that this is what it should be, despite being pretty familiar with XPath and XSLT use and implementation. It's only been part of my working toolkit for the last 8 years or so, after all. When rewriting a positional match, any preceding predicates must be folded into the rewritten expression. Now I see it, it's pretty obvious.

Failing tests now pass, which is lovely.

Here's the change.


jez, 18th Apr 2008

[Add a comment]

Friday 08 February, 2008
#Taggle: Parameterised on string_type

The Taggle parser in subversion is now parameterised on string_type and string_adaptor, in exactly the same way as the usual Arabica XMLReader class. The two are now equivalent, which means that all the SAX filters, the DOM builder, XPath, and so on can be applied to Taggle.


[Add a comment]

Older news ...

Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.

Have fun

SourceForge Project Page

Jez Higgins