Friday 21 December, 2007
Earlier this week, I outlined the equivalent XPath expressions for XSLT matches which use positions. I've bene avoiding implementing this for a while for a couple of reasons. Firstly, I thought it would take more time than I generally have in one Arabica sitting, i.e. more than an hour. Secondly, I wasn't quite sure how I'd actually go about it. I knew the equivalent expression were correct, I just didn't know how I was going to arrive at them in the code.
Happily, writing it out and then reading it back again later triggered the little flash I needed, and it turns out to be really quite easy. I've spent a bit of time on it this morning, having wrapped up the paying work for Christmas, and I've got the first pass working.
Within Arabica, each match pattern is represented as one or more steps, where each step in the pattern is represented as a TestStepExpression. A TestStepExpression contains some kind of node test (obviously) and one or more predicates. The predicate is the bit in square brackets. A match pattern like this
would be compiled into a TestStepExpression contain a NameNodeTest checking for 'a', and a single predicate '3'.
To find expressions that need rewriting is simply
The code is here, if you're interested.
foreach step in the steplist
foreach predicate in the step predicatelist
if predicate is type NUMBER
predicate = rewrite(predicate)
This kind of transformation could be done earlier on, on the AST produced by Arabica's XPath parser. It's easier, however, to operate on the compiled version. For instance, in the case above the numeric predicate could be a number literal, the result of a function, or the result of an arbitrarily complex calculation. Detecting all those cases is actually quite tricky at the AST level. Once the compiled objects are generated, we can just check the predicates return type.
Finding predicates containing calls to the last() or position() functions is going be slightly more work, and probably slightly fiddlier. I'm off to have a crack at that now.
[An hour or so later] ... and I think that's it. Each predicate is an expression, and an expression may contain other expressions and so on. An expression might model a comparision operator, for instance, which would contain expression each for the left and right hand operands. I put together a little walker to zoom through this expression tree looking for instances of the last or position functions. If it finds one, then I just generate the rewrite expression in exactly the same way as before.