I need for further processing the result set of a MySQL query as a dataframe. The SQL table contains about 2 million rows and 12 columns (Data size = 180 MiB). I'm running OS X 10.9 with 8 GB memory. Is it normal that pandas.read_sql takes more than 20 secs to return the dataframe? How to implement a chunk size option like in pandas.read_csv?
Edit: Python 2.7.6, pandas 0.13.1
Reading SQL queries into Pandas dataframes is a common task, and one that can be very slow. Depending on the database being used, this may be hard to get around, but for those of us using Postgres we can speed this up considerably using the COPY command.
Overall, pandas outperformed Postgres, often running over five to ten times faster for the larger datasets. The only cases when Postgres performed better were for smaller sized datasets, typically lesss than a thousand rows.
pandas read_sql() function is used to read SQL query or database table into DataFrame.
Pandas documentation shows that read_sql()
/read_sql_query()
takes about 10 times the time to read a file compare to read_hdf()
and 3 times the time of read_csv()
.
The read_sql()
has now a chunk-size argument ( see the documentation)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With