Wednesday 30 January, 2008
#Taggle: And there it is ...

Taggle, Arabica's port of the TagSoup HTML parser, now builds and runs. It dodges pretty much every encoding issue on the planet, but as a first go it's really quite pleasing. Give it this -

This is <B>bold, <I>bold italic, </b>italic, </i> normal text

and get this

    <body>This is
            <i>bold italic, </i>
    <i>italic, </i>
normal text
(Ok, you have to squint a bit at the indenting, but that's a separate issue.)

If you want to have a play, check out the tagsoup-port branch from subversion:

svn co svn://jezuk.dnsalias.net/jezuk/arabica/branches/tagsoup-port

In examples/Taggle, there's a little command line application that read HTML documents and prints the corrected markup to the console.

I'll merge this back into the trunk in the next few days.

Why not implement an HTML5 parser instead of porting TagSoup?
zcorpan [e], 1st Feb 2008

Time and inclination. Porting TagSoup to C++ took me a few hours. It was fun, and quite an easy win. Having done it, I'm surprised that nobody's done it before.

Writing an HTML5 parser needs rather more time than I have - not only in writing the code, developing the test suite, but then tracking the standard as it emerges. Even if I had the time, I don't actually have the inclination, because it's not something that really interests me enough right now. Sorry :)

jez, 2nd Feb 2008
Thank you, this is precisely what I wanted.

I've been HTML coding since 3.2. Long after HTML8.0 has formally broken and obsoleted HTML5.3 and previous, tag-soup still works.

David Mullin [e] [w], 11th Feb 2012

It seems the SVN server is down. Would you please publish the updated download location?
Moritz [e], 9th Jul 2013
