I can connect to my local mysql database from python, and I can create, select from, and insert individual rows.
My question is: can I directly instruct mysqldb to take an entire dataframe and insert it into an existing table, or do I need to iterate over the rows?
In either case, what would the python script look like for a very simple table with ID and two data columns, and a matching dataframe?
Create a dataframe by calling the pandas dataframe constructor and passing the python dict object as data. Invoke to_sql() method on the pandas dataframe instance and specify the table name and database connection. This creates a table in MySQL database server and populates it with the data from the pandas dataframe.
There is now a to_sql
method, which is the preferred way to do this, rather than write_frame
:
df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')
Also note: the syntax may change in pandas 0.14...
You can set up the connection with MySQLdb:
from pandas.io import sql import MySQLdb con = MySQLdb.connect() # may need to add some other options to connect
Setting the flavor
of write_frame
to 'mysql'
means you can write to mysql:
sql.write_frame(df, con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')
The argument if_exists
tells pandas how to deal if the table already exists:
if_exists: {'fail', 'replace', 'append'}
, default'fail'
fail
: If table exists, do nothing.
replace
: If table exists, drop it, recreate it, and insert data.
append
: If table exists, insert data. Create if does not exist.
Although the write_frame
docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.
Andy Hayden mentioned the correct function (to_sql
). In this answer, I'll give a complete example, which I tested with Python 3.5 but should also work for Python 2.7 (and Python 3.x):
First, let's create the dataframe:
# Create dataframe import pandas as pd import numpy as np np.random.seed(0) number_of_samples = 10 frame = pd.DataFrame({ 'feature1': np.random.random(number_of_samples), 'feature2': np.random.random(number_of_samples), 'class': np.random.binomial(2, 0.1, size=number_of_samples), },columns=['feature1','feature2','class']) print(frame)
Which gives:
feature1 feature2 class 0 0.548814 0.791725 1 1 0.715189 0.528895 0 2 0.602763 0.568045 0 3 0.544883 0.925597 0 4 0.423655 0.071036 0 5 0.645894 0.087129 0 6 0.437587 0.020218 0 7 0.891773 0.832620 1 8 0.963663 0.778157 0 9 0.383442 0.870012 0
To import this dataframe into a MySQL table:
# Import dataframe into MySQL import sqlalchemy database_username = 'ENTER USERNAME' database_password = 'ENTER USERNAME PASSWORD' database_ip = 'ENTER DATABASE IP' database_name = 'ENTER DATABASE NAME' database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'. format(database_username, database_password, database_ip, database_name)) frame.to_sql(con=database_connection, name='table_name_for_df', if_exists='replace')
One trick is that MySQLdb doesn't work with Python 3.x. So instead we use mysqlconnector
, which may be installed as follows:
pip install mysql-connector==2.1.4 # version avoids Protobuf error
Output:
Note that to_sql
creates the table as well as the columns if they do not already exist in the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With