Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?

I tend to import .csv files into pandas, but sometimes I may get data in other formats to make DataFrame objects.

Today, I just found out about read_table as a "generic" importer for other formats, and wondered if there were significant performance differences between the various methods in pandas for reading .csv files, e.g. read_table, from_csv, read_excel.

  1. Do these other methods have better performance than read_csv?
  2. Is read_csv much different than from_csv for creating a DataFrame?
like image 288
pylang Avatar asked Jul 11 '15 22:07

pylang


People also ask

Is read_csv faster than Read_excel?

Importing csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster! Python loads CSV files 100 times faster than Excel files.

What is the difference between read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .

Does pandas read CSV faster than Excel?

Pandas read_csv() is faster but you don't need a VB script to get a csv file. Open your Excel file and save as *.


2 Answers

  1. read_table is read_csv with sep=',' replaced by sep='\t', they are two thin wrappers around the same function so the performance will be identical. read_excel uses the xlrd package to read xls and xlsx files into a DataFrame, it doesn't handle csv files.
  2. from_csv calls read_table, so no.
like image 119
Daniel Boline Avatar answered Oct 18 '22 19:10

Daniel Boline


I've found that CSV and tab-delimited text (.txt) are equivalent in read and write speed, both are much faster than reading and writing MS Excel files. However, Excel format compresses the file size a lot.


For the same 320 MB CSV file (16 MB .xlsx) (i7-7700k, SSD, running Anaconda Python 3.5.3, Pandas 0.19.2)

Using the standard convention import pandas as pd

2 seconds to read .csv df = pd.read_csv('foo.csv') (same for pd.read_table)

15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')

10.5 seconds to write .csv df.to_csv('bar.csv', index=False) (same for .txt)

34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)


To write your dataframes to tab-delimited text files you can use:

df.to_csv('bar.txt', sep='\t', index=False)

like image 29
griffinc Avatar answered Oct 18 '22 20:10

griffinc