If I have a column in a pandas dataframe that is a categorical data type, how can I select the rows using a logical operator ?
for example if I have :
df = pd.DataFrame(np.random.randint(0,100,size=150), columns=['whatever'])
df_bins=np.linspace(df.min(),df.max(),101)
df['bin']=pd.cut(df.iloc[:,0],df_bins)
df['bin'] is an ordered categorical Dtype. How to select the rows whose intervals are e.g. bigger than a certain value. If I do
df['bin']>50
I get an error saying that I cannot compare categorical with a scalar.
By using IntervalIndex and left
pd.IntervalIndex(df['bin']).left>50
Out[28]:
array([False, False, False, True, False, True, False, False, True,
False, False, False, False, False, False, True, True, False,
True, False, False, False, False, False, False, True, False,
False, True, False, False, False, False, False, False, False,
False, False, True, False, True, False, True, True, False,
False, False, False, False, False, True, False, False, True,
True, True, True, True, False, False, False, False, False,
False, False, True, False, False, True, True, False, False,
False, True, True, True, False, True, True, True, True,
False, True, False, True, True, False, True, True, False,
True, True, False, True, True, False, True, True, True,
False, True, True, False, False, False, True, False, True,
False, True, True, True, False, True, True, False, False,
False, True, True, True, False, False, True, False, True,
False, False, True, False, True, False, False, False, True,
False, True, False, False, True, False, True, False, False,
False, False, False, False, False, False])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With