The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.
However, I am unable to find any good examples for working with databases like MySQL for example.
Can anyone point me to links or give some code snippets of how to convert query results using mysql-python to data frames in Pandas efficiently ?
What they have in common is that both Pandas and SQL operate on tabular data (i.e. tables consist of rows and columns). Both Pandas and SQL are essential tools for data scientists and analysts. There are, of course, alternatives for both but they are the predominant ones in the field.
The Pandas is a popular data analysis module that helps users to deal with structured data with simple commands. Using the Pandas dataframe, you can load data from CSV files or any database into the Python code and then perform operations on it.
Pandas support writing dataframes into MySQL database tables as well as loading from them.
This main difference can mean that the two tools are separate, however, you can also perform several of the same functions in each respective tool, for example, you can create new features from existing columns in pandas, perhaps easier and faster than in SQL.
As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using the MySQLdb
and cx_Oracle
libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example for cx_Oracle
:
import pandas as pd import cx_Oracle ora_conn = cx_Oracle.connect('your_connection_string') df_ora = pd.read_sql('select * from user_objects', con=ora_conn) print 'loaded dataframe from Oracle. # Records: ', len(df_ora) ora_conn.close()
And here is the equivalent example for MySQLdb
:
import MySQLdb mysql_cn= MySQLdb.connect(host='myhost', port=3306,user='myusername', passwd='mypassword', db='information_schema') df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn) print 'loaded dataframe from MySQL. records:', len(df_mysql) mysql_cn.close()
For recent readers of this question: pandas have the following warning in their docs for version 14.0:
Warning: Some of the existing functions or function aliases have been deprecated and will be removed in future versions. This includes: tquery, uquery, read_frame, frame_query, write_frame.
And:
Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has been deprecated. MySQL will be further supported with SQLAlchemy engines (GH6900).
This makes many of the answers here outdated. You should use sqlalchemy
:
from sqlalchemy import create_engine import pandas as pd engine = create_engine('dialect://user:pass@host:port/schema', echo=False) f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With