I have a fairly big pandas <code>dataframe - 50</code> or so headers and a few hundred thousand rows of data - and I'm looking to transfer this data to a database using the <code>ceODBC</code> module. Previously I was using <code>pyodbc</code> and using a simple execute statement in a for loop but this was taking ridiculously long (1000 records per 10 minutes)... I'm now trying a new module and am trying to introduce <code>executemany()</code> although I'm not quite sure what's meant by sequence of parameters in: <pre class="prettyprint"><code> cursor.executemany("""insert into table.name(a, b, c, d, e, f) values(?, ?, ?, ?, ?), sequence_of_parameters) </code></pre> should it look like a constant list working through each header like <pre class="prettyprint"><code> ['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', '2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd'] </code></pre> <ul> <li>where this is an example of three rows</li> </ul> or what is the format that's needed? as another related question, how then can I go about converting a regular pandas dataframe to this format? Thanks!

You can try this: <pre class="prettyprint"><code>cursor.executemany(sql_str, your_dataframe.values.tolist()) </code></pre> Hope it helps.

I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using <code>ceODBC</code> which is the module I used, the code is: (with <code>all_data</code> as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples <pre class="prettyprint"><code>for r in all_data.columns.values: all_data[r] = all_data[r].map(str) all_data[r] = all_data[r].map(str.strip) tuples = [tuple(x) for x in all_data.values] </code></pre> for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you. <pre class="prettyprint"><code>string_list = ['NaT', 'nan', 'NaN', 'None'] def remove_wrong_nulls(x): for r in range(len(x)): for i,e in enumerate(tuples): for j,k in enumerate(e): if k == x[r]: temp=list(tuples[i]) temp[j]=None tuples[i]=tuple(temp) remove_wrong_nulls(string_list) </code></pre> create a connection to the database <pre class="prettyprint"><code>cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False) cursor = cnxn.cursor() </code></pre> define a function to turn the list of tuples into a <code>new_list</code> which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB. <pre class="prettyprint"><code>def chunks(l, n): n = max(1, n) return [l[i:i + n] for i in range(0, len(l), n)] new_list = chunks(tuples, 1000) </code></pre> define your query. <pre class="prettyprint"><code>query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)""" </code></pre> Run through the the <code>new_list</code> containing the list of tuples in groups of 1000 and perform <code>executemany</code>. Follow this by committing and closing the connection and that's it :) <pre class="prettyprint"><code>for i in range(len(new_list)): cursor.executemany(query, new_list[i]) cnxn.commit() cnxn.close() </code></pre>

how to transform pandas dataframe for insertion via executemany() statement?

Tags:

python

database

pandas

executemany

I have a fairly big pandas dataframe - 50 or so headers and a few hundred thousand rows of data - and I'm looking to transfer this data to a database using the ceODBC module. Previously I was using pyodbc and using a simple execute statement in a for loop but this was taking ridiculously long (1000 records per 10 minutes)...

I'm now trying a new module and am trying to introduce executemany() although I'm not quite sure what's meant by sequence of parameters in:

    cursor.executemany("""insert into table.name(a, b, c, d, e, f) 
values(?, ?, ?, ?, ?), sequence_of_parameters)

should it look like a constant list working through each header like

    ['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', 
'2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']

where this is an example of three rows

or what is the format that's needed?

as another related question, how then can I go about converting a regular pandas dataframe to this format?

Thanks!

470

asked Apr 29 '15 08:04

Colin O'Brien

2 Answers

You can try this:

cursor.executemany(sql_str, your_dataframe.values.tolist())

Hope it helps.

188

answered Oct 04 '22 00:10

ansen

I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using ceODBC which is the module I used, the code is:

(with all_data as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples

for r in all_data.columns.values:
    all_data[r] = all_data[r].map(str)
    all_data[r] = all_data[r].map(str.strip)   
tuples = [tuple(x) for x in all_data.values]

for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.

string_list = ['NaT', 'nan', 'NaN', 'None']

def remove_wrong_nulls(x):
    for r in range(len(x)):
        for i,e in enumerate(tuples):
            for j,k in enumerate(e):
                if k == x[r]:
                    temp=list(tuples[i])
                    temp[j]=None
                    tuples[i]=tuple(temp)

remove_wrong_nulls(string_list)

create a connection to the database

cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
cursor = cnxn.cursor()

define a function to turn the list of tuples into a new_list which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.

def chunks(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

new_list = chunks(tuples, 1000)

define your query.

query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""

Run through the the new_list containing the list of tuples in groups of 1000 and perform executemany. Follow this by committing and closing the connection and that's it :)

for i in range(len(new_list)):
    cursor.executemany(query, new_list[i])
cnxn.commit()
cnxn.close()

answered Oct 04 '22 02:10

Colin O'Brien

Related questions
                            
                                how to isinstance(x, module)?
                            
                                Execute python code inside browser without Jython
                            
                                Python equivalent of Curl HTTP post
                            
                                Set python virtualenv in vim
                            
                                Override Django form field's name attr
                            
                                Build error with variables and url_for in Flask
                            
                                python RuntimeError: dictionary changed size during iteration
                            
                                Efficient FIFO queue for arbitrarily sized chunks of bytes in Python
                            
                                How to generate an html directory list using Python
                            
                                BeautifulSoup, a dictionary from an HTML table
                            
                                Regex matching between two strings?
                            
                                linux bash script running multiple python
                            
                                Default filter in Django model
                            
                                View permissions in Django [duplicate]
                            
                                Running bpython inside a virtualenv
                            
                                operational error: database is locked
                            
                                replace part of path - python
                            
                                Adjust title font size for a Bokeh figure
                            
                                plot a document tfidf 2D graph
                            
                                Attribute Error trying to run Gmail API quickstart in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to transform pandas dataframe for insertion via executemany() statement?

Tags:

python

database

pandas

executemany

Colin O'Brien

People also ask

2 Answers

ansen

Colin O'Brien

Recent Activity

Donate For Us