Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.read_sql processing speed

Tags:

python

pandas

I need for further processing the result set of a MySQL query as a dataframe. The SQL table contains about 2 million rows and 12 columns (Data size = 180 MiB). I'm running OS X 10.9 with 8 GB memory. Is it normal that pandas.read_sql takes more than 20 secs to return the dataframe? How to implement a chunk size option like in pandas.read_csv?

Edit: Python 2.7.6, pandas 0.13.1

like image 610
Yann Avatar asked Apr 04 '14 23:04

Yann


People also ask

Is pandas Read_sql slow?

Reading SQL queries into Pandas dataframes is a common task, and one that can be very slow. Depending on the database being used, this may be hard to get around, but for those of us using Postgres we can speed this up considerably using the COPY command.

Is Postgres faster than pandas?

Overall, pandas outperformed Postgres, often running over five to ten times faster for the larger datasets. The only cases when Postgres performed better were for smaller sized datasets, typically lesss than a thousand rows.

What is Read_sql_query?

pandas read_sql() function is used to read SQL query or database table into DataFrame.


1 Answers

Pandas documentation shows that read_sql()/read_sql_query() takes about 10 times the time to read a file compare to read_hdf() and 3 times the time of read_csv().

The read_sql() has now a chunk-size argument ( see the documentation)

like image 130
Adrien Pacifico Avatar answered Sep 22 '22 14:09

Adrien Pacifico