<?xml version="1.0"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"><rss version="0.91"><channel>  <title>Arabica XML Toolkit in C++</title>  <description>Arabica Development Log</description>  <link>/arabica/log</link>  <language>en-gb</language>  <webMaster>jez@jezuk.co.uk</webMaster>  
<item><title>Arabica source code repository</title><link>http://www.jezuk.co.uk/arabica/log?id=3991</link><description><![CDATA[ <p>Entirely through my own stupidity, I managed to corrupt the Arabica subversion repository.  By sheer good luck, I'd been using <a href='http://bazaar-vcs.org/'>Bazaar</a> as my front-end client, and so had a clone of the entire repository sitting in my working directory.  Accordingly, the Arabica source code is now housed in a Bazaar repository.</p>
<p>The repository can be <a href='http://jezuk.dnsalias.net/arabica/'>browsed</a> and you can grab your own working copy over HTTP using <pre>
  bzr branch http://jezuk.dnsalias.net/arabica-bzr/trunk
</pre>
Write-access using <code>bzr+ssh</code> is available on request.</p> ]]></description></item>
<item><title>Development snapshots</title><link>http://www.jezuk.co.uk/arabica/log?id=3940</link><description><![CDATA[ <p>Arabica code as at 13:00 on the 1st of August :
<ul>
  <li><a href='/files/arabica-2009-summer.tar.bz2'>Tar.bz2 bundle</a></li>
  <li><a href='/files/arabica-2009-summer.tar.gz'>Tar.gz bundle</a></li>
  <li><a href='/files/arabica-2009-summer.zip'>Zip bundle</a></li>
</ul>
</p> ]]></description></item>
<item><title>Arabica March 2009 Release</title><link>http://www.jezuk.co.uk/arabica/log?id=3910</link><description><![CDATA[ <p>Just uploaded to <a href='https://sourceforge.net/project/platformdownload.php?group_id=56163'>Sourceforge</a>.  Proper release notes to follow but main difference is a big performance improvement in Taggle parsing and further work on Arabica's XSLT engine.</p> ]]></description></item>
<item><title>Just wrote quite a long piece about what's been going on in Arabica over the past four months then, like a burk, killed Firefox</title><link>http://www.jezuk.co.uk/arabica/log?id=3907</link><description><![CDATA[ <p>Just wrote quite a long piece about what's been going on in Arabica over the past four months then, like a burk, killed Firefox.  Hurrr.</p>
<p>What I'd said, in a rather long winded and rambling way, was that import precedence is now works correctly for all cases, not just mainly implemented for the common cases, a couple of nagging little bits got sorted out, and over the past few weeks I've implemented xsl:key and key().  As many times before, James Clark's concise and subtle <a href='http://www.w3.org/TR/xslt#key'>spec</a> text has been a pleasure to work with, and I've surprised myself with how easily I've been able to implement a feature.  I've been working with this code for a long time now, but it really is holding up.</p> ]]></description></item>
<item><title>FAQ: When will Arabica&apos;s XSLT library be finished?</title><link>http://www.jezuk.co.uk/arabica/log?id=3870</link><description><![CDATA[ <p>To tell the truth, I have no idea.  Development is of Mangle, Arabica's XSLT engine, is ongoing, although progress varies according to the vagarities of how busy I am, how energetic I'm feeling, whether the kids have a swimming gala, and so on and so forth.</p>
<p>Although it's not done yet, it might well be  done enough.  I'm using the OASIS XSLT test suite to help drive development, and so it also provides a measure of how much has been done, what's working and what isn't.  The <a href='http://spreadsheets.google.com/pub?key=pQSUogJPG5pARCFUTTYNhWg'>results are published here</a>, but all the code and test data is included in the <a href='/arabica/code'>download</a>.  The executive summary is the core stuff that you use every day works, but some of the bits round the edges (edges defined by my experience, anyway) are missing.</p>
<p>To my knowledge there's nothing that causes Mangle to crash, and anything that I haven't yet implemented generates a warning when the stylesheet is compiled.</p>
<p>Give it a go.  It might do what you need.</p> ]]></description></item>
<item><title>FAQ: What are all those failing tests, and why are they ignored?</title><link>http://www.jezuk.co.uk/arabica/log?id=3868</link><description><![CDATA[ <p>If you run the tests, the final testsuite exercises the XSLT engine and it will list a number of failures.  Quite a large number.  XSLT development is ongoing, and I'm using the OASIS XSLT test suite to guide that.  Consequently, the tests that fail generally indicate something I haven't done yet, rather than an actual bug.  The XSLT tests are, therefore, ignored by <code>make check</code> (should you be lucky enough to be working on a Unixy platform).</p>
<p>Failures in any other tests are, however, indicative of a problem that needs investigating.</p>
 ]]></description></item>
<item><title>Arabica October 2008 Release</title><link>http://www.jezuk.co.uk/arabica/log?id=3866</link><description><![CDATA[ <p>The "Probably long overdue release" bringing a big chunk of new functionality.</p>

<p>Source tar.bz2<br/>
<a href='http://downloads.sourceforge.net/arabica/arabica-2008-october.tar.bz2'>http://downloads.sourceforge.net/arabica/arabica-2008-october.tar.bz2</a></p>
<p>Source tar.gz<br/>
<a href='http://downloads.sourceforge.net/arabica/arabica-2008-october.gz'>http://downloads.sourceforge.net/arabica/arabica-2008-october.tar.gz</a></p>
<p>Source zip<br/>
<a href='http://downloads.sourceforge.net/arabica/arabica-2008-october.zip'>http://downloads.sourceforge.net/arabica/arabica-2008-october.zip</a></p>

<h3>Exciting New Stuff</h3>
<p>The exciting new stuff is <strong>Taggle</strong>, a port of <a href='http://www.ccil.org/~cowan'>John Cowan</a>'s rather super <a href='http://tagsoup.info/'>TagSoup</a> package.</p>
<p>TagSoup, if you're not familiar with it, is
<blockquote>a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: <A href='http://oregonstate.edu/instruct/phl302/texts/hobbes/leviathan-c.html'>poor, nasty and brutish</a>, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. </blockquote>
Obviously, if you have a SAX parser you can apply all your standard XML techniques - not only SAX filters, but building a DOM, <a href='http://www.jezuk.co.uk/jez/2005October#2643'>applying XPaths</a>, or XSLT transformations as well.</p>
<p>Cowan describes what TagSoup does as
<blockquote>TagSoup is designed as a parser, not a whole application; it isn't intended to permanently clean up bad HTML, as HTML Tidy does, only to parse it on the fly. Therefore, it does not convert presentation HTML to CSS or anything similar. It does guarantee well-structured results: tags will wind up properly nested, default attributes will appear appropriately, and so on.<br/><br/>
    The semantics of TagSoup are as far as practical those of actual HTML browsers. In particular, never, never will it throw any sort of syntax error: the TagSoup motto is "Just Keep On Truckin'". But there's much, much more. For example, if the first tag is LI, it will supply the application with enclosing HTML, BODY, and UL tags. Why UL? Because that's what browsers assume in this situation. For the same reason, overlapping tags are correctly restarted whenever possible: text like:<br/>
<br/>
    <code>This is &lt;B&gt;bold, &lt;I&gt;bold italic, &lt;/b&gt;italic, &lt;/i&gt;normal text</code>
<br/>
    gets correctly rewritten as:<br/>
<br/>
    <code>This is &lt;b&gt;bold, &lt;i&gt;bold italic, &lt;/i&gt;&lt;/b&gt;&lt;i&gt;italic, &lt;/i&gt;normal text.</code>
</blockquote>
Looks straightforward, doesn't it? Well, that's a simple example and it's still a tricky and awkward result in practice. Cowan's patience in persuing this and what looks like a rather elegant solution is to be applauded. Porting his code to C++ was quick and painless, and Taggle is a useful addition to Arabica.  Thanks, John.</p>
<p>Arabica Taggle chews through HTML, providing the same SAX XMLReader interface as the XML parser, and can be used in exactly the same way.  HTML source can be fed through SAX filter stacks, used to build DOM trees, queried with XPath, or transformed using XSLT.</p>
<hr/>
<h3>Changes and Bug Fixes</h3>
<p>There are, of course, many other fixes and changes.  Most are relatively minor, and if you haven't been bitten by them you won't notice.  The most significant changes are in Arabica's XSLT engine, Mangle.  While still not feature complete and under development, it takes, in this release, a fairly big step forward.</p>

<p><strong>SAX</strong>
<ul>
  <li>Fixed <code>AttributesImpl.getIndex</code>.  Thanks to Isak Johnsson for that, and <i>what on earth was I thinking</i> to me</li>
  <li>Return attribute type as "CDATA" not the empty string</li>
  <li>After all this time, realised I had too many template parameters on <a href='http://jezuk.dnsalias.net/viewvc/trunk/include/SAX/XMLReader.hpp?view=markup'><code>XMLReaderInterface</code></a>.  It only needs the <code>string_type</code> and <code>string_adaptor</code>.  Any addition parameters are only of interest the implementing parser class</li>
</ul>
</p>

<p><strong>DOM</strong>
<ul>
  <li>Output DocumentFragment properly</li>
	<li>Output &lt;elem/&gt; for empty elements</li>
  <li>Slipped a <a href='http://jezuk.dnsalias.net/viewvc/trunk/include/SAX/filter/TextCoalescer.hpp?view=markup'><code>TextCoalescer</code></a> filter into the DOM builder, so that consecutive bits of text get applied to a single Text or CDATA node, rather than as a series of nodes.  (A series of nodes is perfectly legal, it's just slightly unexpected.  Even to me, and I work with DOMs a lot :) </li>
</ul>
</p>
 
<p><strong>XPath</strong>
<ul>
  <li>Some time ago, it was gently suggested to me that <code>XPathValuePtr</code> and <code>XPathExpressionPtr</code> both exposed implementation details and provided an interface that was inconsistent with the DOM classes, because you accessed the member functions via <code>-&gt;</code> rather than <code>.</code>  At the time, I was just pleased to have got the XPath stuff done and wasn't really fussed, so I left it.  Since then though, it's niggled and niggled away at the back of my mind and now I've done something about it.  <code>XPathValuePtr</code> has become <code>XPathValue</code> and <code>XPathExpressionPtr</code> has become <code>XPathExpression</code>, with the member functions accessed through the <code>.</code> operator.  The <code>XPathValuePtr</code> and <code>XPathExpressionPtr</code> name and <code>-&gt;</code> member access are retained for the meantime, so that existing code won't be broken.  Existing code using XPathValuePtr will still work, but new stuff should use XPathValue</li>
  <li>Correctly implemented Namespace Nodes.  The XPath data model requires that namespace nodes are associated with an element, and sort ahead of attribute nodes in document order.  Until now, Arabica's namespace node had no parent, or owner document and so was failing these requirements</li>
  <li>The default namespace is included when constructing namespace nodes </li>
  <li>Amazingly, the XPath <code>prefix:*</code> didn't compile.  I had no test for it, and had overlooked it.  Now I do, and it isn't</li>
  <li>Unbound namespace prefixes throw an exception</lu>
  <li>Corrected <code>text()</code> test to match CDATA nodes as well as text nodes</li>
  <li>XPaths are now evaluated as if the DOM had been normalised, even if it hasn't.  That is, consecutive text nodes are treated as a single node</li>
</ul>
</p>

<p><strong>XSLT</strong>
<ul>
  <li>Params are not passed on through an <code>xsl:apply-imports</code> call</li>
  <li>Template names are now QNames</li>
  <li>Template mode is now QName</li>
  <li>In XPath <code>node()</code> matches any node of any type.  In an XSLT match pattern, <code>node()</code> matches everything except attributes and the document root node.  Fixed.</li>
  <li>Fixed variable scoping in <code>xsl:for-each</code>, <code>xsl:if</code>, and <code>xsl:choose</code></li>
  <li>Escape naughty text when outputting processing instructions and comments (eg ---)</li>
  <li>Use <code>std::stable_sort</code> instead of <code>std::sort</code>.  When <code>xsl:sort</code> specifies a numerical sort, but you've got some string data in there we need to maintain the relative positions of that string data.  This is the first time I can recall actually using <code>std::stable_sort</code>.  I will mark it down in my big book of programming accomplishments.</li>
  <li>Fixed local-name for namespace nodes</li>
  <li><code>xsl:message</code> can contain another <code>xsl:message</code> - now handled properly</li>
  <li>Empty comments output correctly  </li>
  <li>Ensure <code>xsl:choose</code> has at lease one <code>xsl:when</code></li>
  <li>Make sure any <code>xsl:template</code> <code>mode</code> attribute is not empty</li>
  <li>Verify <code>xsl:sort</code> attribute values </li>
  <li><code>xsl:call-template</code> now throws if it can't find a matching template</li>
  <li>Duplicate variable and parameter names are rejected</li>
  <li>Disallowed <code>current()</code> in match patterns</li>
  <li>Verify <code>xsl:for-each</code> selects a node-set</li>
  <li>Disallow pcdata ahead of an <code>xsl:param</code></li>
  <li><code>xsl:stylesheet</code> now allows top-level elements when they are in a foreign namespace </li>
  <li>Implemented <code>position()</code>, <code>last()</code> and positional predicates in match patterns</li>
  <li>Throw error if transform is run with no input</li>
  <li>Verify QNames at transform compile time</li>
	<li>Detect circular variable references</li>
  <li>Reject variables and parameters which have both a <code>select</code> attribute and text content</li>
  <li>Top level variables and parameters handled according to import precedence</li>
  <li>Fixed internal QName resolution - unprefixed names are not in the default namespace</li>
  <li>Fixed <code>xsl:element</code> unprefixed names - when no namespace uri is supplied are in the default namespace</li>
  <li>Don't suppress output of element namespace prefixes or attributes which are in the XSL namespace</li>
  <li>ensure <code>@xmlns|@xsmlns:*</code> selects no nodes</li>
  <li>direct information messages to <code>std::cerr</code>, not <code>std::cout</code></li>
</ul>
</p>

<p><strong>Build and installation</strong>
<ul>
  <li>Fix for problem installing headers on FreeBSD, where install doesn't understand -D</li>
  <li>Changes to help out-of-tree builds </li>
  <li>Added build files for Visual Studio 2008</li>
  <li>Added configure tests for <code>std::mbstate_t</code> and/or <code>mbstate_t</code>.  Some platforms 
don't have it (VxWorks, for example)</li>
  <li>Visual Studio 2005 and 2003 project files are now munged from the Visual Studio 2008 files.  (Don't try this at home, folks)</li> 
</ul>
</p>

<p><strong>Other bits and bobs</strong>
<ul>
  <li>Fixed for base URIs with leading <code>../</code></li>
  <li>Convert \ to / for relative paths as well as absolute Windows paths.</li>
</ul>
</p>



 ]]></description></item>
<item><title>Arabica: Cutting October 2008 release</title><link>http://www.jezuk.co.uk/arabica/log?id=3865</link><description><![CDATA[ <p>A couple of months ago a release was, I said, <a href='http://www.jezuk.co.uk/cgi-bin/view/arabica/log?id=3825'>impending</a>.  And it really was, but then I found a niggly thing I really want to fix.  And went on holiday.  And got really busy at work.  And all that other stuff that happens when you're not programming.</p>
<p>There really is a release coming now, because I'm cutting it now.  The source bundles <s>will probably go</s> are up on <a href='http://sourceforge.net/projects/arabica'>Sourceforge</a> <s>this evening</s> now, and <a href='http://animal/viewvc/tags/october-2008/'>tagged in subversion</a>. Release notes should follow later this weekend or early next week.  I'll write up the niggly thing too, because it's quite a nice one.  </p>
<p>The last release was <a href='http://www.jezuk.co.uk/cgi-bin/view/arabica/log?id=3417'>just over a year ago</a>.  That's probably a bit too long.</p> ]]></description></item>
<item><title>Arabica: impending release</title><link>http://www.jezuk.co.uk/arabica/log?id=3825</link><description><![CDATA[ <p>Now my latest <a href='http://www.jezuk.co.uk/arabica/log?id=3824'>gentle stroll</a> has concluded, there are one or two platform specific build issues to resolve.  With them done, I expect to be dropping a new release around the end of August or start of September.  The release will include the Taggle HTML parser and improved XSLT support, along with various little bug fixes, minor build improvements.</p>
<p>If you can't wait, there's always <a href='http://www.jezuk.co.uk/arabica/code'>the subversion repository</a>. ]]></description></item>
<item><title>XSLT: Variable resolution</title><link>http://www.jezuk.co.uk/arabica/log?id=3824</link><description><![CDATA[ <p>After a bit of break, I've spent time hacking on Arabica again, which has been lovely.  It's really rather relaxing to just nurdle around in your own code, without any particular pressure or need.  My normal way of working on Arabica's XSLT processor is to run some of the <a href='http://www.oasis-open.org/committees/documents.php?wg_abbrev=xslt'>test suite</a>, pick a failing case, and fix it.  If I can get a few more tests passing in half an hour or an hour, and I generally can, then that's a little step further along.</p>
<p>In this latest little bit of activity, I've been focussing on variables and variable resolution.  I've fixed various problem with <a href='http://jezuk.dnsalias.net/viewvc/trunk/include/XSLT/impl/xslt_variable_stack.hpp?r1=1260&r2=1259&pathrev=1260'>circular references</a>, <a href='http://jezuk.dnsalias.net/viewvc?view=rev&revision=1264'>scoping</a>, namespace resolution, and what I thought was going to be a thorny problem with <a href='http://jezuk.dnsalias.net/viewvc?view=rev&revision=1268'>import precedence</a>.</p>
<p>What constantly surprises me is how straightforward most of these problems are, requiring only a few lines of code.  In fact this has been the story of Arabica's XSLT development.  Once the initial development push <a href='http://www.jezuk.co.uk/arabica/log?id=2953'>was done</a>, almost all the rest has been a few lines here, a few lines there.  I've been working away on this now for coming up three years, on and off and with digressions, and have no idea when I'll be done, but I that doesn't bother me at all.  It's like an old pair of slippers, or favourite woolly jumper.  It's a comfortable, gentle thing to slip into and go for a stroll in every now and again.</p>
 ]]></description></item>

</channel></rss>