I'm using Pandas' <code>to_sql</code> function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns). http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks! <pre class="prettyprint"><code>def write_to_db(engine, frame, table_name, chunk_size): start_index = 0 end_index = chunk_size if chunk_size < len(frame) else len(frame) frame = frame.where(pd.notnull(frame), None) if_exists_param = 'replace' while start_index != end_index: print "Writing rows %s through %s" % (start_index, end_index) frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param) if_exists_param = 'append' start_index = min(start_index + chunk_size, len(frame)) end_index = min(end_index + chunk_size, len(frame)) engine = sqlalchemy.create_engine('mysql://...') #database details omited write_to_db(engine, frame, 'retail_pendingcustomers', 20000) </code></pre>

Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062 So starting from 0.15, you can specify the <code>chunksize</code> argument and e.g. simply do: <pre class="prettyprint"><code>df.to_sql('table', engine, chunksize=20000) </code></pre>

Python Pandas - Using to_sql to write large data frames in chunks

Tags:

python

sql

pandas

mysql

sqlalchemy

I'm using Pandas' to_sql function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!

def write_to_db(engine, frame, table_name, chunk_size):

    start_index = 0
    end_index = chunk_size if chunk_size < len(frame) else len(frame)

    frame = frame.where(pd.notnull(frame), None)
    if_exists_param = 'replace'

    while start_index != end_index:
        print "Writing rows %s through %s" % (start_index, end_index)
        frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
        if_exists_param = 'append'

        start_index = min(start_index + chunk_size, len(frame))
        end_index = min(end_index + chunk_size, len(frame))

engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)

787

asked Jun 03 '14 05:06

Krishan Gupta

2 Answers

Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

So starting from 0.15, you can specify the chunksize argument and e.g. simply do:

df.to_sql('table', engine, chunksize=20000)

198

answered Oct 03 '22 19:10

joris

There is beautiful idiomatic function chunks provided in answer to this question

In your case you can use this function like this:

def chunks(l, n):
""" Yield successive n-sized chunks from l.
"""
    for i in xrange(0, len(l), n):
         yield l.iloc[i:i+n]

def write_to_db(engine, frame, table_name, chunk_size):
    for idx, chunk in enumerate(chunks(frame, chunk_size)):
        if idx == 0:
            if_exists_param = 'replace':
        else:
            if_exists_param = 'append'
        chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)

Only drawback that it doesn't support slicing second index in iloc function.

answered Oct 03 '22 19:10

nes

Related questions
                            
                                Python GStreamer webcam viewer
                            
                                Is there a limit on the number of asynchronous urlfetch calls I can run simultaneously?
                            
                                How to pickle a empty file?
                            
                                How to send SMS using Python/Django application?
                            
                                Pickling an enum exposed by Boost.Python
                            
                                Python: is it possible to mix generator and a recursive function?
                            
                                Setting the system date in Python (on Windows)
                            
                                Pass QuerySet object in template. Django
                            
                                Foreign key relationships missing when reflecting db in SqlAlchemy
                            
                                How do I create a unix timestamp that doesn't adjust for localtime?
                            
                                What is the best way to connect to a sybase database from python?
                            
                                How to validate an xml file against an XSD Schema using Amara library in Python?
                            
                                Pythonic way to assign an instance of a subclass to a variable when a specific string is presented to the constructor of the parent class
                            
                                python: simple approach to killing children or reporting their success?
                            
                                How to reverse django feed url?
                            
                                Saving a temporary file
                            
                                Algorithm to detect similar documents in python script [closed]
                            
                                Finding patterns in list
                            
                                Plotly animated slider in Python
                            
                                Python: Capitalize a word using string.format()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With