Recently, I've been toying with the idea of added XPath support to Arabica. Using a DOM for real work without using XPath almost never seems to happen, for me at least. The longer Arabica's DOM went without XPath, the less useful it seemed to be.
Last night, I had a very brief browse around various around various XPath engines (Saxon, Jaxen, Blue, XPath for ActionScript and one or two more), to see how they parsed out XPath expressions. Jaxen, for instance, uses a SAX-like approach to tokenise the expression. Saxon's parser is all written by hand. I saw one, possibly two, that were generated by Yacc or Bison like tools.
Parsing out the expression is the thing I'm most concerned about. If you can pick the right bits out, then actually doing the work is going to be easy :). Constructing a parser by hand is a big pain. You've got all kinds of state to manage, look ahead to worry about, and all kinds of other things. Plus, I want the whole thing to be (eventually) templated on string type like the rest of Arabica, which adds an extra wrinkle. Further, given a fancy hand craft parser, how do you validate it against the grammer in the rec without building a massive pile of test cases. And if you do make a mistake, how easy will it be to fix.
The XPath rec defines the expression grammer using EBNF, which isn't really a surprise. There are tools like lex and yacc will convert EBNF to code for you. I've never used them though, and didn't fancy starting to learn them last night. I have played with Spirit though. Spirit is a parser toolkit which lets you pretty much transcribe EBNF directly into your C++ source. It's barkingly clever while at the same time being really very simple. (It'd also make a good counter-example to the operator-overloading-is-a-really-bad-idea argument, but that's a whole seperate issue). I pulled the latest release (it's now part of Boost) and set off. The latest release really is a piece of cake to use, so I spent a happy couple of hours with the XPath rec on one side of the screen and vi on the other transcribing XPath EBNF into C++. Sounds silly to say, but I had a really fun time.