Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a new table in a MySQL DB from a pandas dataframe

I recently transitioned from using SQLite for most of my data storage and management needs to MySQL. I think I've finally gotten the correct libraries installed to work with Python 3.6, but now I am having trouble creating a new table from a dataframe in the MySQL database.

Here are the libraries I import:

import pandas as pd
import mysql.connector
from sqlalchemy import create_engine

In my code, I first create a dataframe from a CSV file (no issues here).

def csv_to_df(infile):
    return pd.read_csv(infile)

Then I establish a connection to the MySQL database using this def function:

def mysql_connection():
    user = 'root'
    password = 'abc'
    host = '127.0.0.1'
    port = '3306'
    database = 'a001_db'
    engine = create_engine("mysql://{0}:{1}@{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
    return engine

Lastly, I use the pandas function "to_sql" to create the database table in the MySQL database:

def df_to_mysql(df, db_tbl_name, conn=mysql_connection(), index=False):
    df.to_sql(con = conn, name = db_tbl_name, if_exists='replace', index = False)

I run the code using this line:

df_to_mysql(csv_to_df(r'path/to/file.csv'), 'new_database_table')

The yields the following error:

InvalidRequestError: Could not reflect: requested table(s) not available in Engine(mysql://root:***@127.0.0.1:3306/a001_db?charset=utf8): (new_database_table)

I think this is telling me that I must first create a table in the database before passing the data in the dataframe to this table, but I'm not 100% positive about that. Regardless, I'm looking for a way to create a table in a MySQL database without manually creating the table first (I have many CSVs, each with 50+ fields, that have to be uploaded as new tables in a MySQL database).

Any suggestions?

like image 763
DonnRK Avatar asked Dec 17 '22 22:12

DonnRK


1 Answers

I took an approach suggested by aws_apprentice above which was to create the table first, then write data to the table.

The code below first auto-generates a mysql table from a df (auto defining table names and datatypes) then writes the df data to that table.

There were a couple of hiccups I had to overcome, such as: unnamed csv columns, determining the correct data type for each field in the mysql table.

I'm sure there are multiple other (better?) ways to do this, but this seems to work.

import pandas as pd
from sqlalchemy import create_engine

infile = r'path/to/file.csv'
db = 'a001_db'
db_tbl_name = 'a001_rd004_db004'

'''
Load a csv file into a dataframe; if csv does not have headers, use the headers arg to create a list of headers; rename unnamed columns to conform to mysql column requirements
'''
def csv_to_df(infile, headers = []):
    if len(headers) == 0:
        df = pd.read_csv(infile)
    else:
        df = pd.read_csv(infile, header = None)
        df.columns = headers
    for r in range(10):
        try:
            df.rename( columns={'Unnamed: {0}'.format(r):'Unnamed{0}'.format(r)},    inplace=True )
        except:
            pass
    return df

'''
Create a mapping of df dtypes to mysql data types (not perfect, but close enough)
'''
def dtype_mapping():
    return {'object' : 'TEXT',
        'int64' : 'INT',
        'float64' : 'FLOAT',
        'datetime64' : 'DATETIME',
        'bool' : 'TINYINT',
        'category' : 'TEXT',
        'timedelta[ns]' : 'TEXT'}
'''
Create a sqlalchemy engine
'''
def mysql_engine(user = 'root', password = 'abc', host = '127.0.0.1', port = '3306', database = 'a001_db'):
    engine = create_engine("mysql://{0}:{1}@{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
    return engine

'''
Create a mysql connection from sqlalchemy engine
'''
def mysql_conn(engine):
    conn = engine.raw_connection()
    return conn
'''
Create sql input for table names and types
'''
def gen_tbl_cols_sql(df):
    dmap = dtype_mapping()
    sql = "pi_db_uid INT AUTO_INCREMENT PRIMARY KEY"
    df1 = df.rename(columns = {"" : "nocolname"})
    hdrs = df1.dtypes.index
    hdrs_list = [(hdr, str(df1[hdr].dtype)) for hdr in hdrs]
    for hl in hdrs_list:
        sql += " ,{0} {1}".format(hl[0], dmap[hl[1]])
    return sql

'''
Create a mysql table from a df
'''
def create_mysql_tbl_schema(df, conn, db, tbl_name):
    tbl_cols_sql = gen_tbl_cols_sql(df)
    sql = "USE {0}; CREATE TABLE {1} ({2})".format(db, tbl_name, tbl_cols_sql)
    cur = conn.cursor()
    cur.execute(sql)
    cur.close()
    conn.commit()
   
'''
Write df data to newly create mysql table
'''
def df_to_mysql(df, engine, tbl_name):
    df.to_sql(tbl_name, engine, if_exists='replace')
    
df = csv_to_df(infile)
create_mysql_tbl_schema(df, mysql_conn(mysql_engine()), db, db_tbl_name)
df_to_mysql(df, mysql_engine(), db_tbl_name)
like image 190
DonnRK Avatar answered Jan 23 '23 05:01

DonnRK