Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to transform pandas dataframe for insertion via executemany() statement?

I have a fairly big pandas dataframe - 50 or so headers and a few hundred thousand rows of data - and I'm looking to transfer this data to a database using the ceODBC module. Previously I was using pyodbc and using a simple execute statement in a for loop but this was taking ridiculously long (1000 records per 10 minutes)...

I'm now trying a new module and am trying to introduce executemany() although I'm not quite sure what's meant by sequence of parameters in:

    cursor.executemany("""insert into table.name(a, b, c, d, e, f) 
values(?, ?, ?, ?, ?), sequence_of_parameters)

should it look like a constant list working through each header like

    ['asdas', '1', '2014-12-01', 'true', 'asdasd', 'asdas', '2', 
'2014-12-02', 'true', 'asfasd', 'asdfs', '3', '2014-12-03', 'false', 'asdasd']
  • where this is an example of three rows

or what is the format that's needed?

as another related question, how then can I go about converting a regular pandas dataframe to this format?

Thanks!

like image 470
Colin O'Brien Avatar asked Apr 29 '15 08:04

Colin O'Brien


People also ask

How do you convert a DataFrame to a string?

If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.

How do you convert a DataFrame type?

astype() DataFrame. astype() method is used to cast pandas object to a specified dtype. This function also provides the capability to convert any suitable existing column to a categorical type.

How do I convert a DataFrame to a csv file in Python?

By using pandas. DataFrame. to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column.


2 Answers

You can try this:

cursor.executemany(sql_str, your_dataframe.values.tolist())

Hope it helps.

like image 188
ansen Avatar answered Oct 04 '22 00:10

ansen


I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using ceODBC which is the module I used, the code is:

(with all_data as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples

for r in all_data.columns.values:
    all_data[r] = all_data[r].map(str)
    all_data[r] = all_data[r].map(str.strip)   
tuples = [tuple(x) for x in all_data.values]

for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.

string_list = ['NaT', 'nan', 'NaN', 'None']

def remove_wrong_nulls(x):
    for r in range(len(x)):
        for i,e in enumerate(tuples):
            for j,k in enumerate(e):
                if k == x[r]:
                    temp=list(tuples[i])
                    temp[j]=None
                    tuples[i]=tuple(temp)

remove_wrong_nulls(string_list)

create a connection to the database

cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
cursor = cnxn.cursor()

define a function to turn the list of tuples into a new_list which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.

def chunks(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

new_list = chunks(tuples, 1000)

define your query.

query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""

Run through the the new_list containing the list of tuples in groups of 1000 and perform executemany. Follow this by committing and closing the connection and that's it :)

for i in range(len(new_list)):
    cursor.executemany(query, new_list[i])
cnxn.commit()
cnxn.close()
like image 35
Colin O'Brien Avatar answered Oct 04 '22 02:10

Colin O'Brien