I have a pandas data set, called 'df'. How can I do something like below; <pre class="prettyprint"><code>df.query("select * from df") </code></pre> Thank you. For those who know R, there is a library called sqldf where you can execute SQL code in R, my question is basically, is there some library like sqldf in python

After some time of using this I realised the easiest way is to just do <pre class="prettyprint"><code>from pandasql import sqldf output = sqldf("select * from df") </code></pre> Works like a charm where <code>df</code> is a pandas dataframe You can install pandasql: https://pypi.org/project/pandasql/

Executing an SQL query over a pandas dataset

Tags:

python

sqlite

pandas

I have a pandas data set, called 'df'.

How can I do something like below;

df.query("select * from df")

Thank you.

For those who know R, there is a library called sqldf where you can execute SQL code in R, my question is basically, is there some library like sqldf in python

700

asked Aug 24 '17 15:08

Miguel Santos

4 Answers

This is not what pandas.query is supposed to do. You can look at package pandasql (same like sqldf in R )

import pandas as pd
import pandasql as ps

df = pd.DataFrame([[1234, 'Customer A', '123 Street', np.nan],
               [1234, 'Customer A', np.nan, '333 Street'],
               [1233, 'Customer B', '444 Street', '333 Street'],
              [1233, 'Customer B', '444 Street', '666 Street']], columns=
['ID', 'Customer', 'Billing Address', 'Shipping Address'])

q1 = """SELECT ID FROM df """

print(ps.sqldf(q1, locals()))

     ID
0  1234
1  1234
2  1233
3  1233

Update 2020-07-10

update the pandasql

ps.sqldf("select * from df")

127

answered Oct 20 '22 11:10

BENY

After some time of using this I realised the easiest way is to just do

from pandasql import sqldf

output = sqldf("select * from df")

Works like a charm where df is a pandas dataframe You can install pandasql: https://pypi.org/project/pandasql/

answered Oct 20 '22 09:10

Miguel Santos

Much better solution is to use duckdb. It is much faster than sqldf because it does not have to load the entire data into sqlite and load back to pandas.

pip install duckdb

import pandas as pd
import duckdb
test_df = pd.DataFrame.from_dict({"i":[1, 2, 3, 4], "j":["one", "two", "three", "four"]})

duckdb.query("SELECT * FROM test_df where i>2").df() # returns a result dataframe

Performance improvement over pandasql: test data NYC yellow cabs ~120mb of csv data

nyc = pd.read_csv('https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv',low_memory=False)

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

pysqldf("SELECT * FROM nyc where trip_distance>10")
# wall time 16.1s

duckdb.query("SELECT * FROM nyc where trip_distance>10").df()
# wall time 183ms

A improvement of speed of roughly 100x

This article gives good details and claims 1000x improvement over pandasql: https://duckdb.org/2021/05/14/sql-on-pandas.html

answered Oct 20 '22 11:10

Leo Liu

You can use DataFrame.query(condition) to return a subset of the data frame matching condition like this:

df = pd.DataFrame(np.arange(9).reshape(3,3), columns=list('ABC'))
df
   A  B  C
0  0  1  2
1  3  4  5
2  6  7  8

df.query('C < 6')
   A  B  C
0  0  1  2
1  3  4  5


df.query('2*B <= C')
   A  B  C
0  0  1  2


df.query('A % 2 == 0')
   A  B  C
0  0  1  2
2  6  7  8

This is basically the same effect as an SQL statement, except the SELECT * FROM df WHERE is implied.

answered Oct 20 '22 09:10

user1717828

Related questions
                            
                                Take n rows from a spark dataframe and pass to toPandas()
                            
                                How do I encrypt and decrypt a string in python?
                            
                                What does "list comprehension" mean? How does it work and how can I use it?
                            
                                cProfile saving data to file causes jumbles of characters
                            
                                How can I set two primary key fields for my models in Django
                            
                                how to send the output of pprint module to a log file
                            
                                Avoiding "MySQL server has gone away" on infrequently used Python / Flask server with SQLAlchemy
                            
                                How to zip two differently sized lists?
                            
                                Use tqdm with concurrent.futures?
                            
                                How do I get the UTC time of "midnight" for a given timezone?
                            
                                python pandas flatten a dataframe to a list
                            
                                inheritance from str or int
                            
                                How can tox install the modules via the requirements file?
                            
                                Multiple inheritance in python3 with different signatures
                            
                                Multiple constructors: the Pythonic way? [duplicate]
                            
                                Best way to loop over a python string backwards
                            
                                Do I need to pass the full path of a file in another directory to open()?
                            
                                How to write a custom decorator in django?
                            
                                Matplotlib: Plotting numerous disconnected line segments with different colors
                            
                                Python Selenium Chrome Webdriver [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Executing an SQL query over a pandas dataset

Tags:

python

sqlite

pandas

Miguel Santos

People also ask

4 Answers

BENY

Miguel Santos

Leo Liu

user1717828

Recent Activity

Donate For Us