Why Can't I Decode \xdf (ß) Into Utf-8?
I have a bytestring b'\xDF'. When I try to decode it to UTF-8, a UnicodeDecodeError is thrown. Decoding to CP1252 works fine. In both charsets, 0xDF is represented by the character
Solution 1:
All single-byte encoded characters in UTF-8 have to be in the range [0x00 .. 0x7F] (https://en.wikipedia.org/wiki/UTF-8). Those are equivalent to 7-bit ASCII.
For the german ß
, you'd get 2 bytes in UTF-8:
>>>"ß".encode("utf-8")
b'\xc3\x9f'
Which also works correctly when decoding:
b'\xc3\x9f'.decode("utf-8")
'ß'
Post a Comment for "Why Can't I Decode \xdf (ß) Into Utf-8?"