Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert pandas dataframe via mysqldb into database?

I can connect to my local mysql database from python, and I can create, select from, and insert individual rows.

My question is: can I directly instruct mysqldb to take an entire dataframe and insert it into an existing table, or do I need to iterate over the rows?

In either case, what would the python script look like for a very simple table with ID and two data columns, and a matching dataframe?

like image 723
Stefan Avatar asked May 10 '13 06:05

Stefan


People also ask

How do I write pandas Dataframe in MySQL?

Create a dataframe by calling the pandas dataframe constructor and passing the python dict object as data. Invoke to_sql() method on the pandas dataframe instance and specify the table name and database connection. This creates a table in MySQL database server and populates it with the data from the pandas dataframe.


2 Answers

Update:

There is now a to_sql method, which is the preferred way to do this, rather than write_frame:

df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql') 

Also note: the syntax may change in pandas 0.14...

You can set up the connection with MySQLdb:

from pandas.io import sql import MySQLdb  con = MySQLdb.connect()  # may need to add some other options to connect 

Setting the flavor of write_frame to 'mysql' means you can write to mysql:

sql.write_frame(df, con=con, name='table_name_for_df',                  if_exists='replace', flavor='mysql') 

The argument if_exists tells pandas how to deal if the table already exists:

if_exists: {'fail', 'replace', 'append'}, default 'fail'
     fail: If table exists, do nothing.
     replace: If table exists, drop it, recreate it, and insert data.
     append: If table exists, insert data. Create if does not exist.

Although the write_frame docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.

like image 106
Andy Hayden Avatar answered Sep 23 '22 15:09

Andy Hayden


Andy Hayden mentioned the correct function (to_sql). In this answer, I'll give a complete example, which I tested with Python 3.5 but should also work for Python 2.7 (and Python 3.x):

First, let's create the dataframe:

# Create dataframe import pandas as pd import numpy as np  np.random.seed(0) number_of_samples = 10 frame = pd.DataFrame({     'feature1': np.random.random(number_of_samples),     'feature2': np.random.random(number_of_samples),     'class':    np.random.binomial(2, 0.1, size=number_of_samples),     },columns=['feature1','feature2','class'])  print(frame) 

Which gives:

   feature1  feature2  class 0  0.548814  0.791725      1 1  0.715189  0.528895      0 2  0.602763  0.568045      0 3  0.544883  0.925597      0 4  0.423655  0.071036      0 5  0.645894  0.087129      0 6  0.437587  0.020218      0 7  0.891773  0.832620      1 8  0.963663  0.778157      0 9  0.383442  0.870012      0 

To import this dataframe into a MySQL table:

# Import dataframe into MySQL import sqlalchemy database_username = 'ENTER USERNAME' database_password = 'ENTER USERNAME PASSWORD' database_ip       = 'ENTER DATABASE IP' database_name     = 'ENTER DATABASE NAME' database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.                                                format(database_username, database_password,                                                        database_ip, database_name)) frame.to_sql(con=database_connection, name='table_name_for_df', if_exists='replace') 

One trick is that MySQLdb doesn't work with Python 3.x. So instead we use mysqlconnector, which may be installed as follows:

pip install mysql-connector==2.1.4  # version avoids Protobuf error 

Output:

enter image description here

Note that to_sql creates the table as well as the columns if they do not already exist in the database.

like image 33
Franck Dernoncourt Avatar answered Sep 23 '22 15:09

Franck Dernoncourt