Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Arabica: XSLT: Run away from the hills! If you see hills, run the other way!

An XSLT processor has two distinct pieces: a compiler, which reads the stylesheets and builds an executable model of some sort (the transformer); and the compiled transformer which you run against a target document. Obviously the compiler needs to know about the transformer and how to build it, but the transformer need know nothing about how it sprang into being.

This is good for me, because it means I might only have half the job to redo. Up to now, I've been compiling stylesheets in a streaming mode using a big pile of SAX content handlers. As I encounter an xsl:element, say, I connect the SAX event stream to an xsl:element handler which creates the xsl:element object, populates, validates it, adds it to the containing object, finally creates the next handler and connects that. It's a valid (and I think under-utilised) approach to building object graphs from XML documents - in each handler the context is well defined, the housekeeping is straightforward, the memory requirements are low. I knew there was a possibility I'd have to build the stylesheet as a DOM, if only to satisfy the document('') function. I'd hoped that maybe I could be a bit clever and only do that if that function was actually used.

Re-reading the spec last night (reading a spec is always a good idea when trying to implement it) I realised I'd coded myself into a dead end, and it was time to turn around. Two paragraphs in particular changed my mind. In section 2.6.2 it says

The xsl:import element is only allowed as a top-level element. The xsl:import element children must precede all other element children of an xsl:stylesheet element, including any xsl:include element children. When xsl:include is used to include a stylesheet, any xsl:import elements in the included document are moved up in the including document to after any existing xsl:import elements in the including document.
Section 11.4 says
Both xsl:variable and xsl:param are allowed as top-level elements. A top-level variable-binding element declares a global variable that is visible everywhere.

Do you see the problem? To implement these requirements correctly requires out of order processing. For a top-level variable to be visible everywhere, all the top-level variables must be processed before anything that might reference them. For imports to be moved up, you need to know the surrounding context.

You might still be able to deal with this using streaming processing, but it becomes much more complicated. You'd have to make one pass to build the object model, the make a pass over the model itself to validate it, perhaps defer some processing, and it all starts to look a little hairy.

Using a DOM, this is all much more straightforward. You'd parse the stylesheet into a DOM, and walk over that using XPath. It would make other things, like include handling, more straightforward too.

When I started writing this, I'd decided to rewrite what I'd done using a DOM, but now I'm not so sure. I think maybe I could get the SAX handling to work after all. The rearranging and reordering only needs to happen at the top level of the document. Hmm, perhaps I need to reread the spec again :)


Tagged code, arabica, xml, and c++


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed