Monday 29 October, 2007
#
A couple of months I published some results running Arabica against part of the OASIS XSLT conformance test suite. I've done a bit of work since then, and so it's time to update the numbers
| Run | Failures | Errors | Skips | |
|---|---|---|---|---|
| attribvaltample | 12 | 0 | 0 | 1 |
| axes | 130 | 0 | 0 | 2 |
| boolean | 90 | 0 | 0 | 1 |
| conditional | 23 | 0 | 0 | 0 |
| conflictres | 35 | 0 | 0 | 1 |
| copy | 62 | 0 | 0 | 0 |
| dflt | 4 | 0 | 0 | 0 |
| expression | 6 | 0 | 0 | 6 |
| extend | 4 | 0 | 0 | 4 |
| impincl | 29 | 3 | 0 | 2 |
| lre | 22 | 11 | 0 | 0 |
| match | 32 | 14 | 0 | 1 |
| math | 107 | 1 | 0 | 0 |
| mdocs | 18 | 0 | 0 | 7 |
| message | 16 | 2 | 0 | 2 |
| modes | 17 | 0 | 0 | 0 |
| namedtemplate | 19 | 0 | 0 | 1 |
| namespace | 133 | 39 | 0 | 0 |
| node | 21 | 0 | 0 | 0 |
| output | 108 | 78 | 0 | 1 |
| position | 111 | 7 | 0 | 15 |
| predicate | 58 | 0 | 0 | 0 |
| processorinfo | 1 | 0 | 0 | 1 |
| reluri | 11 | 1 | 0 | 2 |
| select | 85 | 0 | 0 | 6 |
| sort | 37 | 7 | 0 | 10 |
| string | 133 | 4 | 0 | 8 |
| variable | 70 | 7 | 0 | 0 |
| ver | 5 | 0 | 0 | 4 |
| whitespace | 22 | 0 | 0 | 10 |
| Total | 1421 | 174 | 0 | 93 |
Since the last published results, I have one more skip and 20 less fails. My little spreadsheet (the first I have ever constructed, career fact fans) says I'm running 1328 tests altogether, with a pass rate of 86.9%.
A failure means the test ran, but did the wrong thing. An error means it threw an exception, didn't compile the XSLT, or something similarly unexpected. A skip means the test deliberately wasn't run because of some known deficency in my code. It might be a feature I haven't implemented, the test is just plain wrong (there are a couple of these), the test is Xalan specific, or some other thing. Skips come in three flavours - don't bother at all, shouldn't compile, or shouldn't run. If a test that's not expected to compile does, or one that shouldn't run suddenly starts working, that's actually flagged as a failure. There aren't any tests doing this in these results.
Not every failure represents a unique bug. Similarly not every skip represents a unique deficiency. The biggest set of failed tests, the 78 output failures, I haven't investigated in depth but I suspect many of those are related to either HTML output (which I don't do) or text output (which the test harness can't currently compare).
These results are from current Subversion head, built on Windows XP using Visual Studio 8 and expat.
