Oracle, one of software's 800 pound gorillas seems to be employing monkeys, at least as far as their Java XML parser development is concerned. I've been using it for about two hours and have turned up two seperate bugs. Actually, not bugs, they're misfeatures - things which have not been implemented correctly simply because spec text was ignored.
I recently delivered a Java jar file packaging up the XSP portion of the Apache Cocoon XML Web Development Framework. Part of that brief included ensuring that the finished package did not, as Cocoon as whole appeared to, rely on specific versions of the Xerces XML parser. Actually, that turned out to be a requirement to run with Oracle's XML parser under the OC4J application server. Tests which passed using Xerces failed with the Oracle parser, because the Oracle parser failed to add the one line that would implement the SAX spec correcly.
From the SAX website
public final class XMLReaderFactory
Factory for creating an XML reader.
... snip ...
Note to Distributions bundled with parsers: You should modify the implementation of the no-arguments createXMLReader to handle cases where the external configuration mechanisms aren't set up. That method should do its best to return a parser when one is in the class path, even when nothing bound its class name to
org.xml.sax.driverso those configuration mechanisms would see it.
XMLReaderFactory can be implemented in about 20 lines of code. Unfortunately Oracle has the 19 line version because it doesn't modify createXMLReader. One of the goals of SAX is to allow one parser to be transparently swapped for another, without having to change anything. Wrongly implementing
XMLReaderFactory breaks that.
Having determined this was my problem, I was able to make the necessary configuration change (a minor change granted, but still one I shouldn't have had to make). The tests passed, bar one. The test reported a stray processing instruction.
Where could that be coming from?
Dumping the output revealed no processing instruction, although the document did start with
<?xml version="1.0" encoding="UTF-8"?>, a perfectly reasonable and correct XML declaration. Oracle's XML parser incorrectly reports this as a processing instruction. In case anyone is still reading at this point, the relevant spec text is
XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
PI ::= '<?' PITarget (S (Char* - (Char* '?*>' Char*)))? '?>' PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
I'll have a hunt around to see if these are known bugs at Oracle, and if not report them, although I'm certain any reply would be upgrade to version XXXX. Sadly that's not an option my client can entertain at the moment. The next time someone tells you that proprietary software means that someone is accountable, then recall this little rant. If I'd found a similar bug in an open source library, I could have fixed it myself in less time than I've taken to write this.