Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and XSLT implementations, written in Standard C++.

Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.

Arabica is available for download under a BSD-style license.

Latest News

[RSS 0.91]
Sunday 03 March, 2013
#Rather unbelievably, as least for me, some of this Arabica information has been translated into Serbo-Croat by Vera Djuraskovic.
[Add a comment]

Wednesday 28 November, 2012
#Arabica Release - 2012 November

Since putting the Arabica source up on GitHub there seems to have been a little surge in interest in it. It might be coincidence, of course, but I've received several emails and patches of the past few weeks. Once of those emails prompted me to do something I'd been putting off - parameterise the XSLT engine on string type. All the rest of the library is as string type agnostic as I could make it, allowing you to plug in std::string, std::wstring, or whatever other string class you might fancy. (In testing, I actually use a string type with no public member functions.) The XSLT engine was the last hold out, but no more and for the better.

If you've been using the XSLT engine what this means is that where you wrote

    Arabica::XSLT::StylesheetCompiler compiler = ...
    std::auto_ptr<Arabica::XSLT::Stylesheet> stylesheet = ...
you now have to write
    Arabica::XSLT::StylesheetCompiler<std::string> compiler = ...
    std::auto_ptr<Arabica::XSLT::Stylesheet<std::string> > stylesheet = ...
If you haven't been using the XSLT engine because the rest of your application uses std::wstring, then now there's nothing to stop you. Dive in!

Source tar.bz2
http://sourceforge.net/projects/arabica/files/arabica/November-12/arabica-2012-November.tar.bz2/download

Source tar.gz
http://sourceforge.net/projects/arabica/files/arabica/November-12/arabica-2012-November.tar.gz/download

Source zip
http://sourceforge.net/projects/arabica/files/arabica/November-12/arabica-2012-November.zip/download


Changes and Bug Fixes

DOM

  • Expand entity references
  • getLineNumber/getColumnNumber return size_t
  • handle [dtd] pseudo-entity correctly
  • various, mostly obscure, DOM conformances fixes

XSLT

  • XSLT engine is now, like the rest of Arabica, parameterised on string type

Build and installation

  • Solution and project files for Visual Studio 2012 are provided
  • Donated CMake build files included


[Add a comment]

Friday 07 September, 2012
#Arabica on GitHub
I've migrated the Arabica source code to GitHub.
[Add a comment]

Thursday 30 December, 2010
#DOM Conformance Tests

Over the past few days, I've been working on Arabica's DOM conformance. Until now, it's been based entirely on my reading or not of the relevant W3C Recommendations. I've always been pretty confident is was correct, but a recent bit of undirected Googling reminded me of the W3C DOM Conformance Test Suites and I thought "why not".

The W3C tests are defined in XML and then transformed to code using XSLT. It comes with stylesheets to generated Java JUnit tests and Javascript JSUnit tests. Monkeying up something to generated Arabica-style CppUnit code only took a few minutes, and getting that code compiling and running only took a little bit longer than that. Embarrasingly, some of the existing DOM code didn't compile and nobody had ever noticed. Interrogating a doctype for its entities just isn't that common, I guess.

With that done, and to my relief, nearly all of the 500 odd tests in the Level 1 Core suite passed first time. Most of those that didn't relied on loading an external DTD, and those that remained were primarily around the behaviour of entity references and child nodes of attributes. Good to have it all fixed though.

Thanks to those who put these tests together. It must have really rather tedious, but all the tests I've looked at in any detail have been good and sensible.

Will move onto Level 2 Core in due course, but got a hankering to wrestle some more of Arabica's XSLT engine to the floor.


[Add a comment]

Sunday 24 October, 2010
#Arabica Release - 2010 November

For no particular reason than people like official releases and there hasn't been one for a very long time, I've cut a new Arabica release. I'm not entirely sure why I've labelled it 2010-November when it's clearly still October. There is no major new feature, just the gentle accumulation of more work on Arabica's XSLT processor along with sundry bug fixes.

Source tar.bz2
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.tar.bz2/download

Source tar.gz
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.tar.gz/download

Source zip
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.zip/download


Changes and Bug Fixes

SAX

  • Exceptions thrown by MSXML are usefully reported, and no longer corrupt the stack
  • updated for most recent Xerces release

DOM

  • Corrected set/get/removeNamedItemNS functions
  • splitText fixed
  • fixed setAttributeNodeNS
  • double delete when removing and re-adding an attribute fixed
  • operator<< extended for wide streams
  • operator<< correctly generates auotmatic namespaces prefixes for attributes

XPath

  • Some optimisations in the expression evaluation
  • variables may now, optionally, be resolved at compile time

XSLT

  • xsl:key and key() implemented
  • cdata-section-elements supported
  • literal result element (aka embedded stylesheets) implemented
  • minor speed optmiations
  • xsl:sort/@lang is still not supported, but now issues a warning rather than throwing an exception
  • function-available implemented
  • element-available stub implementation
  • xsl:sort attributes correctly implemented as attribute value templates
  • allow and ignore attributes in foreign namespaces
  • verify the qualified names used in the stylesheet (eg. as template names) have prefixes which are bound
  • take precedence into account when resolving named templates
  • disallow variables in xsl:key match and use expressions

Build and installation

  • Solution and project files for Visual Studio 7 (2003) and 8 (2005) are no longer provided. A script to generate them from the VS9 files is provided. The results are not guaranteed, but has worked fine when used previously.

Other bits and bobs

  • Builds without warnings
  • xgrep example application now also outputs non-nodeset results


I never did write the release notes for the previous release, back in March 2009. For completeness sake, they are

XSLT

  • generate-id implemented
  • detect circular imports and includes
  • escape tabs, carriage returns and line feeds when outputting attribute values

Other bits and bobs

  • Improved URI parsing


[Add a comment]

Older news ...

Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.

Have fun

SourceForge Project Page

Jez Higgins