Skip to content Skip to sidebar Skip to footer

Problem Opening A Text Document - Unicode Error

i have probably rather simple question. however, i am just starting to use python and it just drives me crazy. i am following the instructions of a book and would like to open a si

Solution 1:

(unicode eror) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \UXXXXXXXX escape

This probably means that the file you are trying to read is not in the encoding that open() expects. Apparently open() expects some Unicode encoding (most likely UTF-8 or UTF-16), but your file is not encoded like that.

You should not normally use plain open() for reading text files, as it is impossible to correctly read a text file (unless it's pure ASCII) without specifying an encoding.

Use codecs instead:

import codecs
fileObj = codecs.open( "someFile", "r", "utf-8" )
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file

Solution 2:

Change that to

# for Python 2.5+import sys
try:
   d = open("p0901aus.txt","w")
except Exception, ex:
   print"Unsuccessful."print ex
   sys.exit(0)

# for Python 3import sys
import codecs
try:
  d = codecs.open("p0901aus.txt","w","utf-8")
except Exception as ex:
  print("Unsuccessful.")
  print(ex)
  sys.exit(0)

The W is case-sensitive. I do not want to hit you with all the Python syntax at once, but it will be useful for you to know how to display what exception was raised, and this is one way to do it.

Also, you are opening the file for writing, not reading. Is that what you wanted?

If there is already a document named p0901aus.txt, and you want to read it, do this:

#for Python 2.5+import sys
try:
   d = open("p0901aus.txt","r")
   print"Awesome, I opened p0901aus.txt.  Here is what I found there:"for l in d:
      print l
except Exception, ex:
   print"Unsuccessful."print ex
   sys.exit(0)

#for Python 3+import sys
import codecs
try:
   d = codecs.open("p0901aus.txt","r","utf-8")
   print"Awesome, I opened p0901aus.txt.  Here is what I found there:"for l in d:
      print(l)
except Exception, ex:
   print("Unsuccessful.")
   print(ex)
   sys.exit(0)

You can of course use the codecs in Python 2.5 also, and your code will be higher quality ("correct") if you do. Python 3 appears to treat the Byte Order Mark as something between a curiosity and line noise which is a bummer.

Solution 3:

import csv

data = csv.reader(open('c:\x\list.csv' ))

for row indata:

    print(row)

print('ready')

Brings up "(unicode error)'unicodeescape' codec can't decode bytes in position 2-4: truncated \xXX escape"

Try c:\\x\\list.csv instead of c:\x\list.csv

This is Python 3 code.

Post a Comment for "Problem Opening A Text Document - Unicode Error"