Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python-pandas and databases like mysql

Tags:

python

pandas

The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.

However, I am unable to find any good examples for working with databases like MySQL for example.

Can anyone point me to links or give some code snippets of how to convert query results using mysql-python to data frames in Pandas efficiently ?

like image 428
user1320615 Avatar asked Apr 08 '12 18:04

user1320615


People also ask

Is pandas and SQL similar?

What they have in common is that both Pandas and SQL operate on tabular data (i.e. tables consist of rows and columns). Both Pandas and SQL are essential tools for data scientists and analysts. There are, of course, alternatives for both but they are the predominant ones in the field.

Can pandas be used as database?

The Pandas is a popular data analysis module that helps users to deal with structured data with simple commands. Using the Pandas dataframe, you can load data from CSV files or any database into the Python code and then perform operations on it.

Does pandas work with MySQL?

Pandas support writing dataframes into MySQL database tables as well as loading from them.

Which is better SQL or pandas?

This main difference can mean that the two tools are separate, however, you can also perform several of the same functions in each respective tool, for example, you can create new features from existing columns in pandas, perhaps easier and faster than in SQL.


2 Answers

As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using the MySQLdb and cx_Oracle libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example for cx_Oracle:

import pandas as pd import cx_Oracle  ora_conn = cx_Oracle.connect('your_connection_string') df_ora = pd.read_sql('select * from user_objects', con=ora_conn)     print 'loaded dataframe from Oracle. # Records: ', len(df_ora) ora_conn.close() 

And here is the equivalent example for MySQLdb:

import MySQLdb mysql_cn= MySQLdb.connect(host='myhost',                  port=3306,user='myusername', passwd='mypassword',                  db='information_schema') df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)     print 'loaded dataframe from MySQL. records:', len(df_mysql) mysql_cn.close() 
like image 136
Keith C Campbell Avatar answered Oct 16 '22 03:10

Keith C Campbell


For recent readers of this question: pandas have the following warning in their docs for version 14.0:

Warning: Some of the existing functions or function aliases have been deprecated and will be removed in future versions. This includes: tquery, uquery, read_frame, frame_query, write_frame.

And:

Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has been deprecated. MySQL will be further supported with SQLAlchemy engines (GH6900).

This makes many of the answers here outdated. You should use sqlalchemy:

from sqlalchemy import create_engine import pandas as pd engine = create_engine('dialect://user:pass@host:port/schema', echo=False) f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID') 
like image 42
Korem Avatar answered Oct 16 '22 02:10

Korem