ValueError: Import Data Via Chunks Into Pandas.csv_reader()
I have a large gzip file which I would like to import into a pandas dataframe. Unfortunately, the file has an uneven number of columns. The data has roughly this format: .... Col_
Solution 1:
You could also try this:
for chunk in pd.read_csv(filename, sep='\t', chunksize=10**5, engine='python', error_bad_lines=False):
print(chunk)
error_bad_lines
would skip bad lines thought. I will see if a better alternative can be found
EDIT: In order to maintain the lines that were skipped by error_bad_lines
we can go through the error and add it back to the dataframe
line = []
expected = []
saw = []
cont = True
while cont == True:
try:
data = pd.read_csv('file1.csv',skiprows=line)
cont = False
except Exception as e:
errortype = e.message.split('.')[0].strip()
if errortype == 'Error tokenizing data':
cerror = e.message.split(':')[1].strip().replace(',','')
nums = [n for n in cerror.split(' ') if str.isdigit(n)]
expected.append(int(nums[0]))
saw.append(int(nums[2]))
line.append(int(nums[1])-1)
else:
cerror = 'Unknown'
print 'Unknown Error - 222'
Post a Comment for "ValueError: Import Data Via Chunks Into Pandas.csv_reader()"