Skip to content Skip to sidebar Skip to footer

How To Convert A String To Unicode/byte String In Python 3?

I know this works: a = u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728' print(a) # 方法,删除存储在 But if I have a string from a JSON file which does not start with 'u

Solution 1:

If I understand correctly, the file contains the literal text \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728 (so it's plain ASCII, but with backslashes and all that describe the Unicode ordinals the same way you would in a Python str literal). If so, there are two ways to handle this:

  1. Read the file in binary mode, then call mystr = mybytes.decode('unicode-escape') to convert from the bytes to str interpreting the escapes
  2. Read the file in text mode, and use the codecs module for the "text -> text" conversion (bytes to bytes and text to text codecs are now supported only by the codecs module functions; bytes.decode is purely for bytes to text and str.encode is purely for text to bytes, because usually, in Py2, str.encode and unicode.decode was a mistake, and removing the dangerous methods makes it easier to understand what direction the conversions are supposed to go), e.g. decodedstr = codecs.decode(encodedstr, 'unicode-escape')

Post a Comment for "How To Convert A String To Unicode/byte String In Python 3?"