Pandas using too much memory with read_sql_table

Tags:

I am trying to read in a table from my Postgres database into Python. Table has around 8 million rows and 17 columns, and has a size of 622MB in the DB.

I can export the entire table to csv using psql, and then use pd.read_csv() to read it in. It works perfectly fine. Python process only uses around 1GB of memory and everything is good.

Now, the task we need to do requires this pull to be automated, so I thought I could read the table in using pd.read_sql_table() directly from the DB. Using the following code

import sqlalchemy
engine = sqlalchemy.create_engine("postgresql://username:password@hostname:5432/db")
the_frame = pd.read_sql_table(table_name='table_name', con=engine,schema='schemaname')

This approach starts using a lot of memory. When I track the memory usage using Task Manager, I can see the Python process memory usage climb and climb, until it hits all the way up to 16GB and freezes the computer.

Any ideas on why this might be happening is appreciated.

468

asked Dec 21 '16 00:12

user4505419

1 Answers

You need to set the chunksize argument so that pandas will iterate over smaller chunks of data. See this post: https://stackoverflow.com/a/31839639/3707607

answered Oct 10 '22 15:10

Ted Petrou

Related questions
                            
                                Any way to find all possible kwargs for a function in python from cli?
                            
                                Decide when to refresh OAUTH2 token with Python Social Auth
                            
                                Exclude manylinux wheels when downloading from pip
                            
                                Python: l2-Penalty for logistic regression model from statsmodels?
                            
                                PySpark: TypeError: 'Row' object does not support item assignment
                            
                                Using python with Anaconda in Windows
                            
                                Python : Ramer-Douglas-Peucker (RDP) algorithm with number of points instead of epsilon
                            
                                Implementing Tuples and Lists in the isinstance Function in Python 2.7
                            
                                How to apply RANSAC in Python OpenCV
                            
                                Zoom action in android using appium-python-client
                            
                                ImportError: No module named setuptools.command on Mac OS X within virtualenv
                            
                                How __reduce__ function exactly works in case of pickle module?
                            
                                Insert large amount of data to BigQuery via bigquery-python library
                            
                                numpy ndarray with more that 32 dimensions
                            
                                Access HDF files stored on s3 in pandas
                            
                                Pandas Dataframe performance vs list performance
                            
                                on reading tiff file using skimage
                            
                                Django, JSONField, Postgres, and F() object comparison
                            
                                'Connection refused' error trying to follow the Python Elasticsearch example usage
                            
                                Why does Python 2 allow comparisons between lists and numbers? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas using too much memory with read_sql_table

Tags:

python

pandas

postgresql

sqlalchemy

user4505419

People also ask

1 Answers

Ted Petrou

Recent Activity

Donate For Us