The Voidspace Techie Blog

Gravatar If you end up calling "decode" on Unicode objects, your application is not I18N-safe in any way.

"decode" and "encode" are used to convert between encoded 8-bit strings and Unicode strings. Unicode strings should always be decoded. That's the whole point of using Unicode.

If you call "decode" on a Unicode string anyway, Python converts the string back to an ASCII string before "decoding" it again. This doesn't work if the Unicode string contains non-ASCII text, and it doesn't work if the string contains "encoded" non-ASCII text, which makes the whole operation useless. (and the only way to end up with "encoded" data in a Unicode string is to mess up on the way in).


Gravatar Thanks.

I was using it to encode something with 'rot13' that had already been 'base64' encoded - so I knew it was ascii only... but it was still wrong !


Gravatar Aha. I've always considered mixing text encodings with "base64" and "rot13" stuff to be a design flaw...

(if that mistake hadn't been made, nobody had come up with the bright idea of adding a "decode" method to the decoded string type, you would have used the appropriate function instead, and all would have been well...)


Name:

Email:

URL:

Comment:  ? 

 

Commenting by HaloScan