How To Convert Repr Into Encoded String
Solution 1:
Your solution is OK, the only thing is that eval
is dangerous when used with arbitrary inputs. The safe alternative is to use ast.literal_eval
:
>>>s = '\\xce\\xb8Oph'>>>from ast import literal_eval>>>literal_eval("b'{}'".format(s)).decode('utf8')
'\u03b8Oph'
With eval you are subject to:
>>> eval("b'{}'".format("1' and print('rm -rf /') or b'u r owned")).decode('utf8')
rm -rf /
'u r owned'
Since ast.literal_eval
is the opposite of repr
for literals, I guess it is what you are looking for.
[updade]
If you have a file with escaped unicode, you may want to open it with the unicode_escape
encoding as suggested in the answer by Ginger++. I will keep my answer because the question was "how to convert repr into encoded string", not "how to decode file with escaped unicode".
Solution 2:
Just open your file with unicode_escape
encoding, like:
withopen('name', encoding="unicode_escape") as f:
pass# your code here
Original answer:
>>> '\\xce\\xb8Oph'.encode('utf-8').decode('unicode_escape')
'θOph'
You can get rid of that encoding to UTF-8, if you read your file in binary mode instead of text mode:
>>> b'\\xce\\xb8Oph'.decode('unicode_escape')
'θOph'
Solution 3:
Unfortunately, this is really problematic. It's \ killing you softly here.
I can only think of:
s = '\\xce\\xb8Oph\\r\\nMore test\\t\\xc5\\xa1'
n = ""
x = 0while x!=len(s):
if s[x]=="\\":
sx = s[x+1:x+4]
marker = sx[0:1]
if marker=="x": n += chr(int(sx[1:], 16)); x += 4elif marker in ("'", '"', "\\", "n", "r", "v", "t", "0"):
# Pull this dict out of a loop to speed things up
n += {"'": "'", '"': '"', "\\": "\\", "n": "\n", "r": "\r", "t": "\t", "v": "\v", "0": "\0"}[marker]
x += 2else: n += s[x]; x += 1else: n += s[x]; x += 1printrepr(n), repr(s)
printrepr(n.decode("UTF-8"))
There might be some other trick to pull this off, but at the moment this is all I got.
Solution 4:
To make a teeny improvement on GingerPlusPlus's answer:
import tempfile
with tempfile.TemporaryFile(mode='rb+') as f:
f.write(r'\xce\xb8Oph'.encode())
f.flush()
f.seek(0)
print(f.read().decode('unicode_escape').encode('latin1').decode())
If you open the file in binary mode (i.e. rb
, since you're reading, I added +
since I was also writing to the file) you can skip the first encode
call. It's still awkward, because you have to bounce through the decode/encode hop, but you at least do get to avoid that first encoding call.
Post a Comment for "How To Convert Repr Into Encoded String"