No full blog post this time. I just wanted to point you towards an excellent article I read yesterday, by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
The article provides a short and highly readable intro to the different ways text can be encoded, and why the concept of “plain text” is not only a myth, but a dangerous myth. The article itself dates from 2003, but the information is definitely still relevant, and I am ashamed to say that I have been ignorant of the implications of text encodings for a long time. I don’t know if Joel’s threat to make me “peel onions for 6 months in a submarine” is still valid, but if it is, I best start packing.
It’s an absolute must read if you ever write any code at all. It won’t give you all the answers, but will hopefully leave you knowing what you don’t know, and that, I believe, is half the battle.
Now go click that link!