Performance Difference In Pandas Read_table Vs. Read_csv Vs. From_csv Vs. Read_excel?
Solution 1:
read_table
isread_csv
withsep=','
replaced bysep='\t'
, they are two thin wrappers around the same function so the performance will be identical.read_excel
uses thexlrd
package to read xls and xlsx files into a DataFrame, it doesn't handle csv files.from_csv
callsread_table
, so no.
Solution 2:
I've found that CSV and tab-delimited text (.txt) are equivalent in read and write speed, both are much faster than reading and writing MS Excel files. However, Excel format compresses the file size a lot.
For the same 320 MB CSV file (16 MB .xlsx) (i7-7700k, SSD, running Anaconda Python 3.5.3, Pandas 0.19.2)
Using the standard convention import pandas as pd
2 seconds to read .csv df = pd.read_csv('foo.csv')
(same for pd.read_table)
15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')
10.5 seconds to write .csv df.to_csv('bar.csv', index=False)
(same for .txt)
34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)
To write your dataframes to tab-delimited text files you can use:
df.to_csv('bar.txt', sep='\t', index=False)
Post a Comment for "Performance Difference In Pandas Read_table Vs. Read_csv Vs. From_csv Vs. Read_excel?"