Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a select query into a pandas DataFrame using PeeWee

Using the PeeWee ORM I have the following query:

query = DataModel.select()where(DataModel.field == "value")

Is there any way to convert query into a pandas DataFrame without iterating over all the values? I'm looking for a more "Pythonic" way of doing this.

like image 522
MikeyE Avatar asked Mar 04 '17 12:03

MikeyE


People also ask

Can we use SQL query in pandas DataFrame?

Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.

How do I create a DataFrame from a selected column?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.

How do you use Astype panda?

Pandas Series: astype() functionThe astype() function is used to cast a pandas object to a specified data type. Use a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.


3 Answers

Assuming query is of type peewee.SelectQuery, you could do:

df = pd.DataFrame(list(query.dicts()))

EDIT: As Nicola points out below, you're now able to do pd.DataFrame(query.dicts()) directly.

like image 136
Greg Reda Avatar answered Oct 10 '22 09:10

Greg Reda


Just in case someone finds this useful, I was searching for the same conversion but in Python 3. Inspired by @toto_tico's previous answer, this is what I came up with:

import pandas
import peewee


def data_frame_from_peewee_query(query: peewee.Query) -> pandas.DataFrame:
    connection = query._database.connection()  # noqa
    sql, params = query.sql()
    return pandas.read_sql_query(sql, connection, params=params)

Checked with Python 3.9.6, pandas==1.3.2 and peewee==3.14.4, using peewee.SqliteDatabase.

like image 5
franferrax Avatar answered Oct 10 '22 09:10

franferrax


The following is a more efficient way, because it avoids creating the list and then pass it to the pandas dataframe. It also has the side benefit of preserving the order of the columns:

df = pd.read_sql(query.sql()[0], database.connection())

You need direct access to the peewee database, for example, in the quickstart tutorial corresponds to:

db = SqliteDatabase('people.db')

Of course, you can also create your own connection to the database.

Drawback: you should be careful if you have repeated columns in the two tables, e.g. id columns would appear twice. So make sure to correct those before continuing.


If you are using a peewee proxy

import peewee as pw; 
database_proxy = pw.Proxy()

then the connection is here:

database_proxy.obj.connection()
like image 5
toto_tico Avatar answered Oct 10 '22 10:10

toto_tico