Friday 09 December, 2005
One obvious memory optimisation for DOM trees is string pooling. In any reasonable XML document, there will be a lot of repeated element and attribute names. Rather than every Element node in your tree having its own copy of that name (and quite possibly a namespace URI too), you keep all the names in a common table with the nodes can then point into.
When I first wrote Arabica's DOM I didn't bother, mainly because I more concerned with the memory management aspects of the tree itself, and making it sure it got cleaned up properly. After that, I still didn't bother with it because on the kind of small documents I was working with memory wasn't an issue, and there was always something more exciting to do.
Anyway, I finally implemented string pooling for Element and Attributes names and namespace URIs this morning. It's taken less than two hours. I am a twit.