Efficiently writing large Pandas data frames to disk

Tags:

I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. I've tried all of the methods in Python for Data Analysis, but the performance has been very disappointing.

This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long.

I strongly suspect that I am the problem, not Python or Pandas.

Any suggestions?

788

asked Oct 28 '13 16:10

user2928791

1 Answers

Using HDFStore is your best bet (not covered very much in the book, and has changed quite a lot). You will find performance is MUCH better than any other serialization method.

How to write/read various forms of HDF5
Some recipes using HDF5
Comparing performance of various writing/reading methods

answered Oct 14 '22 04:10

Jeff

Related questions
                            
                                Allowing remote access to Elasticsearch
                            
                                how to write a script to edit a JSON file? [closed]
                            
                                How to create dropdown with value and text node - WXPython
                            
                                Colored output from fabric script
                            
                                Python/Django REST API Architecture
                            
                                Fitting curve: why small numbers are better?
                            
                                Django: How to mock file uploads while testing views
                            
                                Why do some languages return nil when a key is not in a dictionary, while Python throws an exception?
                            
                                Read csv file with many named column labels with pandas
                            
                                Function missing 2 required positional arguments: 'x' and 'y'
                            
                                Using BeautifulSoup to select div blocks within HTML
                            
                                Json print output in python different from write output because of escaped characters
                            
                                Python pandas dataframe add previous row values
                            
                                Conflicting versions of python in ubuntu
                            
                                Python modulus the same as remainder?
                            
                                Assigning a variable directly to a function in Python
                            
                                urllib3 on python 2.7 SNI error on Google App Engine
                            
                                Python 2.7 - Extract Zip From Email Message File
                            
                                Filtering a Pandas DataFrame Without Removing Rows
                            
                                Translation DNA to Protein

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficiently writing large Pandas data frames to disk

Tags:

python

pandas

user2928791

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us