Python Utf-8 Latin-1 Displays Wrong Character
Solution 1:
Your source code is encoded to UTF-8, but you are decoding the data as Latin-1. Don't do that, you are creating a Mojibake.
Decode from UTF-8 instead, and don't encode again. print
will write to sys.stdout
which will have been configured with your terminal or console codec (detected when Python starts).
My terminal is configured for UTF-8, so when I enter the å
character in my terminal, UTF-8 data is produced:
>>> 'å''\xc3\xa5'>>> 'å'.decode('latin1')
u'\xc3\xa5'>>> print'å'.decode('latin1')
Ã¥
You can see that the character uses two bytes; when saving your Python source with an editor configured to use UTF-8, Python reads the exact same bytes from disk to put into your bytestring.
Decoding those two bytes as Latin-1 produces two Unicode codepoints corresponding to the Latin-1 codec.
You probably want to do some studying on the difference between Unicode and encodings, and how that relates to Python:
Post a Comment for "Python Utf-8 Latin-1 Displays Wrong Character"