[RSS 0.91]
Tuesday 02 September, 2003
#More codecvt facet renaming. I, like many people, use the terms UTF16 and UCS2 interchangably when talking about slinging Unicode text as wide characters, even though I knew there was a difference between the two. Here's what the Unicode glossary says All clear? No, I didn't think so either. However, with Arabica (and XML processing more generally) it's sometimes helpful to distinguish between a Unicode text as wide characters, and a byte sequence encoding thos wide characters. Therefore, I'm going to use UTF16 to mean a byte sequence, and UCS2 to mean a character sequence. This is still probably not quite right, but I'll take my chances.

Upshot of this is I've renamed utf8utf16codecvt to utf8ucs2codecvt. I've also committed two new codecvts, utf16beucscodecvt and utf16leucs2codecvt, which perform UCS2 to big-endian and little-endian UTF16 conversion.
[Add a comment]

SourceForge Project Page

Jez Higgins