Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read External SQL File into Pandas Dataframe

This is a simple question that I haven't been able to find an answer to. I have a .SQL file with two commands. I'd like to have Pandas pull the result of those commands into a DataFrame.

The SQL file's commands are as such, with the longer query using today's date.

SET @todaydate = DATE(NOW());
SELECT ...long query....;

I've attempted to use read_sql in the following way after establishing my connection (prod_db) and get the error message ''NoneType' object is not iterable'

sqlpath = 'path.sql'
scriptFile = open(sqlpath,'r')
script = scriptFile.read()
df = pd.read_sql(script,prod_db) 

I've also tried to use the function and approach described here reading external sql script in python but I'm not sure how to get the result into a pandas dataframe (or perhaps I'm missing something). It doesn't seem to be reading the results as I get 'Command Skipped' repeatedly.

def executeScriptsFromFile(filename):
    fd = open(filename, 'r')
    sqlFile = fd.read()
    fd.close()
    # all SQL commands (split on ';')
    sqlCommands = sqlFile.split(';')
    # Execute every command from the input file
    for command in sqlCommands:
        try:
            c.execute(command)
        except OperationalError, msg:
            print "Command skipped: ", msg
df = executescriptsfromfile(sqlpath)
like image 863
scoloe Avatar asked Oct 11 '17 17:10

scoloe


2 Answers

I have a solution that might work for you. It should give you a nice little pandas.DataFrame.

First, you have to read the query inside the sql file. Then just use the pd.read_sql_query() instead of pd.read_sql()

I am sure you know it, but here is the doc for the function: http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql_query.html#pandas.read_sql_query

# Read the sql file
query = open('filename.sql', 'r')

# connection == the connection to your database, in your case prob_db
DF = pd.read_sql_query(query.read(),connection)
query.close() 

I can assure you that it is working with T-SQL, but I never used it with MySQL.

like image 69
Geof Avatar answered Oct 25 '22 11:10

Geof


This is a MWE of how it worked for me:

query = open('./query_file.sql', 'r') 

db_config = {
            'server': server address,
            'port': port,
            'user': user,
            'password': password,
            'database': db name
        }

    try:
        sql_conn = pymssql.connect(**db_config)
        logging.info('SQL connection is opened')       
        avise_me_df = pd.read_sql(query.read(),sql_conn)
        logging.info('pandas df recorded')
    except OperationalError as e:
        connected = False

        logging.error('Error reading data from SQL table')
    else:
        connected = True
    finally:
        if connected:
            sql_conn.close()
            logging.info('SQL connection is closed')

I hope this might help.

like image 20
Miguel Rueda Avatar answered Oct 25 '22 12:10

Miguel Rueda