Skip to content Skip to sidebar Skip to footer

Performance Difference In Pandas Read_table Vs. Read_csv Vs. From_csv Vs. Read_excel?

I tend to import .csv files into pandas, but sometimes I may get data in other formats to make DataFrame objects. Today, I just found out about read_table as a 'generic' importer f

Solution 1:

  1. read_table is read_csv with sep=',' replaced by sep='\t', they are two thin wrappers around the same function so the performance will be identical. read_excel uses the xlrd package to read xls and xlsx files into a DataFrame, it doesn't handle csv files.
  2. from_csv calls read_table, so no.

Solution 2:

I've found that CSV and tab-delimited text (.txt) are equivalent in read and write speed, both are much faster than reading and writing MS Excel files. However, Excel format compresses the file size a lot.


For the same 320 MB CSV file (16 MB .xlsx) (i7-7700k, SSD, running Anaconda Python 3.5.3, Pandas 0.19.2)

Using the standard convention import pandas as pd

2 seconds to read .csv df = pd.read_csv('foo.csv') (same for pd.read_table)

15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')

10.5 seconds to write .csv df.to_csv('bar.csv', index=False) (same for .txt)

34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)


To write your dataframes to tab-delimited text files you can use:

df.to_csv('bar.txt', sep='\t', index=False)


Post a Comment for "Performance Difference In Pandas Read_table Vs. Read_csv Vs. From_csv Vs. Read_excel?"