I'd like to set the value of a column based on a query. I could probably use .where
to accomplish this, but the criteria for .query
are strings which are easier for me to maintain, especially when the criteria become complex.
import numpy as np
import pandas as pd
np.random.seed(51723)
df = pd.DataFrame(np.random.rand(n, 3), columns=list('abc'))
I'd like to make a new column, d, and set the value to 1 where these criteria are met:
criteria = '(a < b) & (b < c)'
Among other things, I've tried:
df['d'] = np.nan
df.query(criteria).loc[:,'d'] = 1
But that seems to do nothing except giving the SettingWithCopyWarning
even though I'm using .loc
And passing inplace
like this:
df.query(criteria, inplace=True).loc[:,'d'] = 1
Gives AttributeError: 'NoneType' object has no attribute 'loc'
AFAIK df.query()
returns a new DF, so try the following approach:
In [146]: df.loc[df.eval(criteria), 'd'] = 1
In [147]: df
Out[147]:
a b c d
0 0.175155 0.221811 0.808175 1.0
1 0.069033 0.484528 0.841618 1.0
2 0.174685 0.648299 0.904037 1.0
3 0.292404 0.423220 0.897146 1.0
4 0.169869 0.395967 0.590083 1.0
5 0.574394 0.804917 0.746797 NaN
6 0.642173 0.252437 0.847172 NaN
7 0.073629 0.821715 0.859776 1.0
8 0.999789 0.833708 0.230418 NaN
9 0.028163 0.666961 0.582713 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With