Arabica is an XML toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.
- SAX is an event-based XML processing API. Arabica is a full SAX2 implementation, including the optional interfaces and helper classes. It provides uniform SAX2 wrappers for the Expat parser, Xerces, Libxml2 and, on Windows, for the Microsoft XML parser.
- The DOM is a platform- and language-neutral interface which models an XML document as a tree of nodes, defined by the W3C. Arabica implements the DOM Level 2 Core on top of the SAX layer.
- XPath is a language for addressing parts of an XML document. Arabica implements XPath 1.0 over its DOM implementation.
- XSLT is a language for transforming XML documents into other XML documents. Arabica builds XSLT over its XPath engine.
Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.
Arabica is available for download under a BSD-style license.
Latest News
[RSS 0.91]
Wednesday 06 August, 2008
#Arabica: impending release Now my latest gentle stroll has concluded, there are one or two platform specific build issues to resolve. With them done, I expect to be dropping a new release around the end of August or start of September. The release will include the Taggle HTML parser and improved XSLT support, along with various little bug fixes, minor build improvements.
If you can't wait, there's always the subversion repository.
[Add a comment]
#XSLT: Variable resolution After a bit of break, I've spent time hacking on Arabica again, which has been lovely. It's really rather relaxing to just nurdle around in your own code, without any particular pressure or need. My normal way of working on Arabica's XSLT processor is to run some of the test suite, pick a failing case, and fix it. If I can get a few more tests passing in half an hour or an hour, and I generally can, then that's a little step further along.
In this latest little bit of activity, I've been focussing on variables and variable resolution. I've fixed various problem with circular references, scoping, namespace resolution, and what I thought was going to be a thorny problem with import precedence.
What constantly surprises me is how straightforward most of these problems are, requiring only a few lines of code. In fact this has been the story of Arabica's XSLT development. Once the initial development push was done, almost all the rest has been a few lines here, a few lines there. I've been working away on this now for coming up three years, on and off and with digressions, and have no idea when I'll be done, but I that doesn't bother me at all. It's like an old pair of slippers, or favourite woolly jumper. It's a comfortable, gentle thing to slip into and go for a stroll in every now and again.
[Add a comment]
Wednesday 28 May, 2008
#Visual Studio, how I curse your useless warning C4800 'type' : forcing value to bool 'true' or 'false' (performance warning)
Performance warning, my arse.
[Add a comment]
Friday 18 April, 2008
#XSLT: Implementing position matches[2] Revisiting position matches at the moment. I've described how position matches need to be written, and the code works and works well.
Except when it doesn't. It actually fails for the less common cases, and it took me a little while to work out why.
Here's the pattern from the test case that showed the problem
It wants to match the second node in the set of foo elements with an att1 attribute containing 'c'.foo[@att1='c'][2]Arabica rewriting finds the positional predicate and applies its incorrect magic. The rewritten pattern is equivalent to
which picks out the second foo node if it has an att1 attribute containing 'c'. The difference isn't immediately clear, even when you have the both the incorrect output and the expected output sitting in front of you.foo[2][@att1='c']My small crumb of comfort is that if you do want
foo[2][@att1='c'], Arabica does do the right thing. That gave me the clue. Arabica implements XSLT match patterns by rewriting them as XPath expressions.is rewritten as an XPath along the lines offoo[2][@att1='c']self::foo[. = parent::*/foo[2]][@att1='c']My faulty rewriting of
wasfoo[@att1='c'][2]which you should be able to see is logically identical to the above. What I need isself::foo[@att1='c'][. = parent::*/foo[2]]I had to work quite hard to see that this is what it should be, despite being pretty familiar with XPath and XSLT use and implementation. It's only been part of my working toolkit for the last 8 years or so, after all. When rewriting a positional match, any preceding predicates must be folded into the rewritten expression. Now I see it, it's pretty obvious.self::foo[. = parent::*/foo[@att='c'][2]]Failing tests now pass, which is lovely.
jez, 18th Apr 2008
[Add a comment]
Friday 08 February, 2008
#Taggle: Parameterised on string_type The Taggle parser in subversion is now parameterised on string_type and string_adaptor, in exactly the same way as the usual Arabica XMLReader class. The two are now equivalent, which means that all the SAX filters, the DOM builder, XPath, and so on can be applied to Taggle.
[Add a comment]
Older news ...
Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.
