Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and XSLT implementations, written in Standard C++.
- SAX is an event-based XML processing API. Arabica is a full SAX2 implementation, including the optional interfaces and helper classes. It provides uniform SAX2 wrappers for the Expat parser, Xerces, Libxml2 and, on Windows, for the Microsoft XML parser.
- The DOM is a platform- and language-neutral interface which models an XML document as a tree of nodes, defined by the W3C. Arabica implements the DOM Level 2 Core on top of the SAX layer.
- XPath is a language for addressing parts of an XML document. Arabica implements XPath 1.0 over its DOM implementation.
- XSLT is a language for transforming XML documents into other XML documents. Arabica builds XSLT over its XPath engine.
- In addition to the XML parser, Arabica includes Taggle, an HTML parser derived from TagSoup.
Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.
Arabica is available for download under a BSD-style license.
Latest News
[RSS 0.91]
Thursday 30 December, 2010
#DOM Conformance Tests Over the past few days, I've been working on Arabica's DOM conformance. Until now, it's been based entirely on my reading or not of the relevant W3C Recommendations. I've always been pretty confident is was correct, but a recent bit of undirected Googling reminded me of the W3C DOM Conformance Test Suites and I thought "why not".
The W3C tests are defined in XML and then transformed to code using XSLT. It comes with stylesheets to generated Java JUnit tests and Javascript JSUnit tests. Monkeying up something to generated Arabica-style CppUnit code only took a few minutes, and getting that code compiling and running only took a little bit longer than that. Embarrasingly, some of the existing DOM code didn't compile and nobody had ever noticed. Interrogating a doctype for its entities just isn't that common, I guess.
With that done, and to my relief, nearly all of the 500 odd tests in the Level 1 Core suite passed first time. Most of those that didn't relied on loading an external DTD, and those that remained were primarily around the behaviour of entity references and child nodes of attributes. Good to have it all fixed though.
Thanks to those who put these tests together. It must have really rather tedious, but all the tests I've looked at in any detail have been good and sensible.
Will move onto Level 2 Core in due course, but got a hankering to wrestle some more of Arabica's XSLT engine to the floor.
[Add a comment]
Sunday 24 October, 2010
#Arabica Release - 2010 November For no particular reason than people like official releases and there hasn't been one for a very long time, I've cut a new Arabica release. I'm not entirely sure why I've labelled it 2010-November when it's clearly still October. There is no major new feature, just the gentle accumulation of more work on Arabica's XSLT processor along with sundry bug fixes.
Source tar.bz2
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.tar.bz2/downloadSource tar.gz
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.tar.gz/downloadSource zip
http://sourceforge.net/projects/arabica/files/arabica/November-10/arabica-2010-November.zip/download
Changes and Bug Fixes
SAX
- Exceptions thrown by MSXML are usefully reported, and no longer corrupt the stack
- updated for most recent Xerces release
DOM
- Corrected
set/get/removeNamedItemNSfunctionssplitTextfixed- fixed
setAttributeNodeNS- double delete when removing and re-adding an attribute fixed
operator<<extended for wide streamsoperator<<correctly generates auotmatic namespaces prefixes for attributesXPath
- Some optimisations in the expression evaluation
- variables may now, optionally, be resolved at compile time
XSLT
xsl:keyandkey()implementedcdata-section-elementssupported- literal result element (aka embedded stylesheets) implemented
- minor speed optmiations
xsl:sort/@langis still not supported, but now issues a warning rather than throwing an exceptionfunction-availableimplementedelement-availablestub implementationxsl:sortattributes correctly implemented as attribute value templates- allow and ignore attributes in foreign namespaces
- verify the qualified names used in the stylesheet (eg. as template names) have prefixes which are bound
- take precedence into account when resolving named templates
- disallow variables in
xsl:keymatchanduseexpressionsBuild and installation
- Solution and project files for Visual Studio 7 (2003) and 8 (2005) are no longer provided. A script to generate them from the VS9 files is provided. The results are not guaranteed, but has worked fine when used previously.
Other bits and bobs
- Builds without warnings
- xgrep example application now also outputs non-nodeset results
I never did write the release notes for the previous release, back in March 2009. For completeness sake, they are
XSLT
generate-idimplemented- detect circular imports and includes
- escape tabs, carriage returns and line feeds when outputting attribute values
Other bits and bobs
- Improved URI parsing
[Add a comment]
Thursday 05 November, 2009
#Arabica source code repository Entirely through my own stupidity, I managed to corrupt the Arabica subversion repository. By sheer good luck, I'd been using Bazaar as my front-end client, and so had a clone of the entire repository sitting in my working directory. Accordingly, the Arabica source code is now housed in a Bazaar repository.
The repository can be browsed and you can grab your own working copy over HTTP using
bzr branch http://jezuk.dnsalias.net/arabica-bzr/trunkWrite-access usingbzr+sshis available on request.
[Add a comment]
Saturday 01 August, 2009
#Development snapshots Arabica code as at 13:00 on the 1st of August :
[Add a comment]
Friday 13 March, 2009
#Arabica March 2009 Release Just uploaded to Sourceforge. Proper release notes to follow but main difference is a big performance improvement in Taggle parsing and further work on Arabica's XSLT engine.
[Add a comment]
Older news ...
Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.
