Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and XSLT implementations, written in Standard C++.

Arabica is written in Standard C++ and should be portable to most platforms. It is parameterised on string type. Out of the box, it can provide UTF-8 encoded std::strings or UTF-16 encoded std::wstrings, but can easily be customised for arbitrary string types.

Arabica is available for download under a BSD-style license.

Latest News

[RSS 0.91]
Wednesday 30 January, 2008
#Taggle: And there it is ...

Taggle, Arabica's port of the TagSoup HTML parser, now builds and runs. It dodges pretty much every encoding issue on the planet, but as a first go it's really quite pleasing. Give it this -

This is <B>bold, <I>bold italic, </b>italic, </i> normal text

and get this

<html>
    <body>This is
        <b>bold,
            <i>bold italic, </i>
        </b>
    <i>italic, </i>
normal text
    </body>
</html>
(Ok, you have to squint a bit at the indenting, but that's a separate issue.)

If you want to have a play, check out the tagsoup-port branch from subversion:

svn co svn://jezuk.dnsalias.net/jezuk/arabica/branches/tagsoup-port

In examples/Taggle, there's a little command line application that read HTML documents and prints the corrected markup to the console.

I'll merge this back into the trunk in the next few days.

Why not implement an HTML5 parser instead of porting TagSoup?
zcorpan [e], 1st Feb 2008

Time and inclination. Porting TagSoup to C++ took me a few hours. It was fun, and quite an easy win. Having done it, I'm surprised that nobody's done it before.

Writing an HTML5 parser needs rather more time than I have - not only in writing the code, developing the test suite, but then tracking the standard as it emerges. Even if I had the time, I don't actually have the inclination, because it's not something that really interests me enough right now. Sorry :)


jez, 2nd Feb 2008
Thank you, this is precisely what I wanted.

I've been HTML coding since 3.2. Long after HTML8.0 has formally broken and obsoleted HTML5.3 and previous, tag-soup still works.


David Mullin [e] [w], 11th Feb 2012


[Add a comment]

Older news ...

Get in touch Your questions, requests, updates and patches are all welcome. I can be contacted at jez@jezuk.co.uk.

Have fun

SourceForge Project Page

Jez Higgins