Optimal chunksize parameter in pandas.DataFrame.to_sql

Tags:

Working with a large pandas DataFrame that needs to be dumped into a PostgreSQL table. From what I've read it's not a good idea to dump all at once, (and I was locking up the db) rather use the chunksize parameter. The answers here are helpful for workflow, but I'm just asking about the value of chunksize affecting performance.

In [5]: df.shape Out[5]: (24594591, 4)  In [6]: df.to_sql('existing_table',                   con=engine,                    index=False,                    if_exists='append',                    chunksize=10000)

Is there a recommended default and is there a difference in performance when setting the parameter higher or lower? Assuming I have the memory to support a larger chunksize, will it execute faster?

326

asked Feb 04 '16 13:02

Kevin

2 Answers

In my case, 3M rows having 5 columns were inserted in 8 mins when I used pandas to_sql function parameters as chunksize=5000 and method='multi'. This was a huge improvement as inserting 3M rows using python into the database was becoming very hard for me.

answered Oct 19 '22 10:10

Hardik Ojha

I tried something the other way around. From sql to csv and I noticed that the smaller the chunksize the quicker the job was done. Adding additional cpus to the job (multiprocessing) didn't change anything.

answered Oct 19 '22 11:10

Mohamed Amin Chairi

Related questions
                            
                                How to print output when using pytest with xdist
                            
                                condas `source activate virtualenv` does not work within Dockerfile
                            
                                How to tell AWS application load balancer to not forward the path pattern?
                            
                                Automated API testing of OAuth2/OpenID Connect protected API
                            
                                How to traverse cyclic directed graphs with modified DFS algorithm
                            
                                How can a password input be done in python with printing an asterisk for every character of the user?
                            
                                Guided mining of common substructures in large set of graphs
                            
                                Has there been a proposal to add std::bin to the c++ standard?
                            
                                Getting "Cannot evaluate a security function" when using conditional breakpoints and in immediate window
                            
                                What is randomly replacing Baidu TongJi (Analytics)'s Javascript code to make DDOS attack on websites on browser?
                            
                                Does SecurityContext#setAuthentication guaranties visibility?
                            
                                Entity Framework Core: Database operation expected to affect 1 row(s) but actually affected 0 row(s) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With