I am trying to read in a table from my Postgres database into Python. Table has around 8 million rows and 17 columns, and has a size of 622MB in the DB.
I can export the entire table to csv using psql, and then use pd.read_csv() to read it in. It works perfectly fine. Python process only uses around 1GB of memory and everything is good.
Now, the task we need to do requires this pull to be automated, so I thought I could read the table in using pd.read_sql_table() directly from the DB. Using the following code
import sqlalchemy
engine = sqlalchemy.create_engine("postgresql://username:password@hostname:5432/db")
the_frame = pd.read_sql_table(table_name='table_name', con=engine,schema='schemaname')
This approach starts using a lot of memory. When I track the memory usage using Task Manager, I can see the Python process memory usage climb and climb, until it hits all the way up to 16GB and freezes the computer.
Any ideas on why this might be happening is appreciated.
Changing numeric columns to smaller dtype: Instead, we can downcast the data types. Simply Convert the int64 values as int8 and float64 as float8. This will reduce memory usage.
The upper limit for pandas Dataframe was 100 GB of free disk space on the machine. When your Mac needs memory, it will push something that isn't currently being used into a swapfile for temporary storage.
to_sql seems to send an INSERT query for every row which makes it really slow. But since 0.24. 0 there is a method parameter in pandas. to_sql() where you can define your own insertion function or just use method='multi' to tell pandas to pass multiple rows in a single INSERT query, which makes it a lot faster.
Pandas is a popular Python package for data science, as it offers powerful, expressive, and flexible data structures for data explorations and visualization. But when it comes to handling large-sized datasets, it fails, as it cannot process larger than memory data.
You need to set the chunksize
argument so that pandas will iterate over smaller chunks of data. See this post: https://stackoverflow.com/a/31839639/3707607
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With