Arabica is an XML toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.
- SAX is an event-based XML processing API. Arabica is a full SAX2 implementation, including the optional interfaces and helper classes. It provides uniform SAX2 wrappers for the Expat parser, Xerces, Libxml2 and, on Windows, for the Microsoft XML parser.
- The DOM is a platform- and language-neutral interface which models an XML document as a tree of nodes, defined by the W3C. Arabica implements the DOM Level 2 Core on top of the SAX layer.
- XPath is a language for addressing parts of an XML document. Arabica implements XPath 1.0 over its DOM implementation.
- XSLT is a language for transforming XML documents into other XML documents. Arabica builds XSLT over its XPath engine.
Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.
Arabica is available for download under a BSD-style license.
Latest News
[RSS 0.91]
Wednesday 28 May, 2008
#Visual Studio, how I curse your useless warning C4800 'type' : forcing value to bool 'true' or 'false' (performance warning)
Performance warning, my arse.
[Add a comment]
Friday 18 April, 2008
#XSLT: Implementing position matches[2] Revisiting position matches at the moment. I've described how position matches need to be written, and the code works and works well.
Except when it doesn't. It actually fails for the less common cases, and it took me a little while to work out why.
Here's the pattern from the test case that showed the problem
It wants to match the second node in the set of foo elements with an att1 attribute containing 'c'.foo[@att1='c'][2]Arabica rewriting finds the positional predicate and applies its incorrect magic. The rewritten pattern is equivalent to
which picks out the second foo node if it has an att1 attribute containing 'c'. The difference isn't immediately clear, even when you have the both the incorrect output and the expected output sitting in front of you.foo[2][@att1='c']My small crumb of comfort is that if you do want
foo[2][@att1='c'], Arabica does do the right thing. That gave me the clue. Arabica implements XSLT match patterns by rewriting them as XPath expressions.is rewritten as an XPath along the lines offoo[2][@att1='c']self::foo[. = parent::*/foo[2]][@att1='c']My faulty rewriting of
wasfoo[@att1='c'][2]which you should be able to see is logically identical to the above. What I need isself::foo[@att1='c'][. = parent::*/foo[2]]I had to work quite hard to see that this is what it should be, despite being pretty familiar with XPath and XSLT use and implementation. It's only been part of my working toolkit for the last 8 years or so, after all. When rewriting a positional match, any preceding predicates must be folded into the rewritten expression. Now I see it, it's pretty obvious.self::foo[. = parent::*/foo[@att='c'][2]]Failing tests now pass, which is lovely.
jez, 18th Apr 2008
[Add a comment]
Friday 08 February, 2008
#Taggle: Parameterised on string_type The Taggle parser in subversion is now parameterised on string_type and string_adaptor, in exactly the same way as the usual Arabica XMLReader class. The two are now equivalent, which means that all the SAX filters, the DOM builder, XPath, and so on can be applied to Taggle.
[Add a comment]
Thursday 31 January, 2008
#Moments before I'm about to go to bed, I discover Taggle fails (in a coredumpy way) for documents which have a DOCTYPE declaration. Will have a look at a fix in the morning.
And there's the fix in subversion.
jez, 1st Feb 2008
[Add a comment]
#Taggle: Building the code If you've grabbed the code from subversion:
svn co svn://jezuk.dnsalias.net/jezuk/arabica/branches/tagsoup-portyou might be wondering how to build it.For Visual Studio 2005 users, open up the
vs8\taggle.slnproject and build away. It should just work. If it doesn't, then check the project build notes for information on setting up search paths and things.For Unixy types, you will need a mighty three steps:
autoreconf- to create the configure script./configure- to dig out where the various bits and pieces Arabica needs are, and to create theMakefilesmake- to, erm, make everythingProblems, questions, issues? Get in touch.
[Add a comment]
Older news ...
Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.
