Wednesday 30 January, 2008
Taggle, Arabica's port of the TagSoup HTML parser, now builds and runs. It dodges pretty much every encoding issue on the planet, but as a first go it's really quite pleasing. Give it this -
This is <B>bold, <I>bold italic, </b>italic, </i> normal text
and get this
(Ok, you have to squint a bit at the indenting, but that's a separate issue.)
<i>bold italic, </i>
If you want to have a play, check out the tagsoup-port branch from subversion:
svn co svn://jezuk.dnsalias.net/jezuk/arabica/branches/tagsoup-port
examples/Taggle, there's a little command line application that read HTML documents and prints the corrected markup to the console.
I'll merge this back into the trunk in the next few days.
Why not implement an HTML5 parser instead of porting TagSoup?
zcorpan [e], 1st Feb 2008
Time and inclination. Porting TagSoup to C++ took me a few hours. It was fun, and quite an easy win. Having done it, I'm surprised that nobody's done it before.
Writing an HTML5 parser needs rather more time than I have - not only in writing the code, developing the test suite, but then tracking the standard as it emerges. Even if I had the time, I don't actually have the inclination, because it's not something that really interests me enough right now. Sorry :)
jez, 2nd Feb 2008
Thank you, this is precisely what I wanted.
I've been HTML coding since 3.2. Long after HTML8.0 has formally broken and obsoleted HTML5.3 and previous, tag-soup still works.
[Add a comment]