Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe.query method syntax

Tags:

python

pandas

Question:

I would like to gain a better understanding of the Pandas DataFrame.query method and what the following expression represents:

match = dfDays.query('index > @x.name & price >= @x.target')

What does @x.name represent?

I understand what the resulting output is for this code (a new column with pandas.tslib.Timestamp data) but don't have a clear understanding of the expression used to get this end result.

Data:

From here:

Vectorised way to query date and price data

np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
               ,index=rng)
dfDays = pd.DataFrame({'price':ts2})
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)

def find_match(x):
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))
like image 556
nipy Avatar asked Jan 29 '17 14:01

nipy


People also ask

How do I write a query in pandas?

Pandas query syntax Assuming you have a DataFrame, you need to call . query() using “dot syntax”. Basically, type the name of the DataFrame you want to subset, then type a “dot”, and then type the name of the method …. query() .

What is ILOC () in Python?

The iloc() function in python is one of the functions defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.

Can we use SQL query in pandas DataFrame?

Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.

Is query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.


1 Answers

@x.name - @ helps .query() to understand that x is an external object (doesn't belong to the DataFrame for which the query() method was called). In this case x is a DataFrame. It could be a scalar value as well.

I hope this small demonstration will help you to understand it:

In [79]: d1
Out[79]:
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

In [80]: d2
Out[80]:
   a   x
0  1  10
1  7  11

In [81]: d1.query("a in @d2.a")
Out[81]:
   a  b  c
0  1  2  3
2  7  8  9

In [82]: d1.query("c < @d2.a")
Out[82]:
   a  b  c
1  4  5  6

Scalar x:

In [83]: x = 9

In [84]: d1.query("c == @x")
Out[84]:
   a  b  c
2  7  8  9
like image 59
MaxU - stop WAR against UA Avatar answered Sep 29 '22 12:09

MaxU - stop WAR against UA