Question:
I would like to gain a better understanding of the Pandas DataFrame.query method and what the following expression represents:
match = dfDays.query('index > @x.name & price >= @x.target')
What does @x.name
represent?
I understand what the resulting output is for this code (a new column with pandas.tslib.Timestamp
data) but don't have a clear understanding of the expression used to get this end result.
Data:
From here:
Vectorised way to query date and price data
np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
,index=rng)
dfDays = pd.DataFrame({'price':ts2})
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)
def find_match(x):
match = dfDays.query('index > @x.name & price >= @x.target')
if not match.empty:
return match.index[0]
dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))
Pandas query syntax Assuming you have a DataFrame, you need to call . query() using “dot syntax”. Basically, type the name of the DataFrame you want to subset, then type a “dot”, and then type the name of the method …. query() .
The iloc() function in python is one of the functions defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.
Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.
The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.
@x.name
- @
helps .query()
to understand that x
is an external object (doesn't belong to the DataFrame for which the query() method was called). In this case x
is a DataFrame. It could be a scalar value as well.
I hope this small demonstration will help you to understand it:
In [79]: d1
Out[79]:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
In [80]: d2
Out[80]:
a x
0 1 10
1 7 11
In [81]: d1.query("a in @d2.a")
Out[81]:
a b c
0 1 2 3
2 7 8 9
In [82]: d1.query("c < @d2.a")
Out[82]:
a b c
1 4 5 6
Scalar x
:
In [83]: x = 9
In [84]: d1.query("c == @x")
Out[84]:
a b c
2 7 8 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With