Skip to content Skip to sidebar Skip to footer

Weird Characters In String

I am reading some data from file.. But there are some weird characters I am observing like; 'tamb\xc3\xa9m', 'f\xc3\xbcr','cari\xc3\xb1o' My file read code is fairly standard: w

Solution 1:

You have UTF-8 encoded data. You could decode the data:

withopen(filename) as f:
   for line in f:
       print line.decode('utf8')

or use io.open() to have Python decode the contents for you, as you read:

import io

with io.open(filename, encoding='utf8') as f:
   for line in f:
       print line

Your data, decoded:

>>>print'tamb\xc3\xa9m'.decode('utf8')
também
>>>print'f\xc3\xbcr'.decode('utf8')
für
>>>print'cari\xc3\xb1o'.decode('utf8')
cariño

You appear to have printed string representations, (the output of the repr() function), which produces string literal syntax suitable for pasting back into your Python interpreter. \xhh hex codes are used for characters outside of the printable ASCII range. Python containers such as list or dict also use repr() to show their contents, when printed.

You may want to read up on Unicode, and how it interacts with Python. See:

Post a Comment for "Weird Characters In String"