Arabica is an XML toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.

Arabica is available for download under a BSD-style license.

Latest News

[RSS 0.91]
Wednesday 28 May, 2008
#Visual Studio, how I curse your useless warning C4800

'type' : forcing value to bool 'true' or 'false' (performance warning)
Performance warning, my arse.


[Add a comment]

Friday 18 April, 2008
#XSLT: Implementing position matches[2]

Revisiting position matches at the moment. I've described how position matches need to be written, and the code works and works well.

Except when it doesn't. It actually fails for the less common cases, and it took me a little while to work out why.

Here's the pattern from the test case that showed the problem

foo[@att1='c'][2]
It wants to match the second node in the set of foo elements with an att1 attribute containing 'c'.

Arabica rewriting finds the positional predicate and applies its incorrect magic. The rewritten pattern is equivalent to

foo[2][@att1='c']
which picks out the second foo node if it has an att1 attribute containing 'c'. The difference isn't immediately clear, even when you have the both the incorrect output and the expected output sitting in front of you.

My small crumb of comfort is that if you do want foo[2][@att1='c'], Arabica does do the right thing. That gave me the clue. Arabica implements XSLT match patterns by rewriting them as XPath expressions.

foo[2][@att1='c']
is rewritten as an XPath along the lines of
self::foo[. = parent::*/foo[2]][@att1='c']

My faulty rewriting of

foo[@att1='c'][2]
was
self::foo[@att1='c'][. = parent::*/foo[2]]
which you should be able to see is logically identical to the above. What I need is
self::foo[. = parent::*/foo[@att='c'][2]]
I had to work quite hard to see that this is what it should be, despite being pretty familiar with XPath and XSLT use and implementation. It's only been part of my working toolkit for the last 8 years or so, after all. When rewriting a positional match, any preceding predicates must be folded into the rewritten expression. Now I see it, it's pretty obvious.

Failing tests now pass, which is lovely.

Here's the change.


jez, 18th Apr 2008

[Add a comment]

Friday 08 February, 2008
#Taggle: Parameterised on string_type

The Taggle parser in subversion is now parameterised on string_type and string_adaptor, in exactly the same way as the usual Arabica XMLReader class. The two are now equivalent, which means that all the SAX filters, the DOM builder, XPath, and so on can be applied to Taggle.


[Add a comment]

Thursday 31 January, 2008
#

Moments before I'm about to go to bed, I discover Taggle fails (in a coredumpy way) for documents which have a DOCTYPE declaration. Will have a look at a fix in the morning.

And there's the fix in subversion.
jez, 1st Feb 2008

[Add a comment]

#Taggle: Building the code

If you've grabbed the code from subversion:

svn co svn://jezuk.dnsalias.net/jezuk/arabica/branches/tagsoup-port
you might be wondering how to build it.

For Visual Studio 2005 users, open up the vs8\taggle.sln project and build away. It should just work. If it doesn't, then check the project build notes for information on setting up search paths and things.

For Unixy types, you will need a mighty three steps:

  1. autoreconf - to create the configure script
  2. ./configure - to dig out where the various bits and pieces Arabica needs are, and to create the Makefiles
  3. make - to, erm, make everything

Problems, questions, issues? Get in touch.


[Add a comment]

Older news ...

Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.

Have fun

SourceForge Project Page

Jez Higgins