Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert a Pandas Dataframe into MySql using PyMySQL

I have got a DataFrame which has got around 30,000+ rows and 150+ columns. So, currently I am using the following code to insert the data into MySQL. But since it is reading the rows one at a time, it is taking too much time to insert all the rows into MySql.

Is there any way in which I can insert the rows all at once or in batches? The constraint here is that I need to use only PyMySQL, I cannot install any other library.

import pymysql
import pandas as pd

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})


# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')


# create cursor
cursor=connection.cursor()

# creating column list for insertion
cols = "`,`".join([str(i) for i in data.columns.tolist()])

# Insert DataFrame recrds one by one.
for i,row in data.iterrows():
    sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
    cursor.execute(sql, tuple(row))

    # the connection is not autocommitted by default, so we must commit to save our changes
    connection.commit()

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

connection.close()

Thank You.

like image 530
John Doe Avatar asked Oct 04 '19 08:10

John Doe


People also ask

Does pandas work with MySQL?

To connect MySQL using pandas, need to install package 'mysql-connector-python' as below command. mysql. connector provides all the database manipulation using python. Let get deeper on code logic implementation.

How to write data in pandas Dataframe to a mySQL table?

A DataFrame in Pandas is a data structure for storing data in tabular form, i.e., in rows and columns. This article describes how to write the data in a Pandas DataFrame to a MySQL table. Consider a DataFrame with three records like below. You can create a database table in MySQL and insert this data using the to_sql () function in Pandas.

How do I import a Dataframe from Python to MySQL?

Pandas support writing dataframes into MySQL database tables as well as loading from them. Import the required Python modules including pandas, pymysql and sqlalchemy. Obtain an SQLAlchemy engine object to connect to the MySQL database server by providing required credentials.

How to add a Dataframe to a MySQL database all at once?

This approach accomplishes the same end result in a more direct way, and allows us to add a whole dataframe to a MySQL database all at once. Import the module sqlalchemy and create an engine with the parameters user, password, and database name.

How to connect MySQL with pandas in Python?

That will be easier for analysis data against all perspectives. To connect MySQL using pandas, need to install package ‘mysql-connector-python’ as below command. Package installation console


3 Answers

Try using SQLALCHEMY to create an Engine than you can use later with pandas df.to_sql function. This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor.

Your code would look something like this:

import pymysql
import pandas as pd
from sqlalchemy import create_engine

# Create dataframe
data = pd.DataFrame({
    'book_id':[12345, 12346, 12347],
    'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
    'price':[29, 23, 27]
})

db_data = 'mysql+mysqldb://' + 'root' + ':' + '12345' + '@' + 'localhost' + ':3306/' \
       + 'book' + '?charset=utf8mb4'
engine = create_engine(db_data)

# Connect to the database
connection = pymysql.connect(host='localhost',
                         user='root',
                         password='12345',
                         db='book')    

# create cursor
cursor=connection.cursor()
# Execute the to_sql for writting DF into SQL
data.to_sql('book_details', engine, if_exists='append', index=False)    

# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

engine.dispose()
connection.close()

You can take a look to all the options this function has in pandas doc

like image 66
nacho Avatar answered Oct 19 '22 15:10

nacho


It is faster to push a file to the SQL server and let the server manage the input.

So first push the data to a CSV file.

data.to_csv("import-data.csv", header=False, index=False, quoting=2, na_rep="\\N")

And then load it at once into the SQL table.

sql = "LOAD DATA LOCAL INFILE \'import-data.csv\' \
    INTO TABLE book_details FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' \
    (`" +cols + "`)"
cursor.execute(sql)
like image 26
organicData Avatar answered Oct 19 '22 15:10

organicData


Possible improvements.

  • remove or disable indexes on the table(s)
  • Take the commit out of the loop

Now try and load the data.

Generate a CSV file and load using ** LOAD DATA INFILE ** - this would be issued from within mysql.

like image 37
Tim Seed Avatar answered Oct 19 '22 14:10

Tim Seed