pd.version '0.14.0'
I need to do a not in statement for a column in a dataframe.
for the isin statement I use the following to filter for codes that I need:
h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]
I want to do a not in or not equal to (not sure which one is used for python) statement for another column.
So I tried the following:
h1 = df1[df1['csc_auth_12'].notin(['N6M','YEM','YEL','YEM'])]
h1 = df1[df1['csc_auth_12'] not in (['N6M','YEM','YEL','YEM'])]
and:
h1.query(['N6M','YEM','YEL','YEM'] not in ['csc_auth_12'])
I really want to filter out the N6M, YEM, YEL and YEM from the data set.
I'm also interested in how to do an between statement.
So for the following I had to manually type in all the 500 codes. I would like to do something like:
h1 = df1[df1['nat_actn_2_3'].isin['100','102'] and isbetween [500 & 599])]
but this is what I have:
h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104','107','108','112','115','117','120','122','124','128',
'130','132','132','140','141','142','143','145','146','147','148','149','170','171',
'172','173','179','190','198','199','501','502','503','504','505','506','507','508',
'509','510','511','512','513','514','515','516','517','518','519','520','521','522',
'523','524','525','526','527','528','529','530','531','532','533','534','535','536',
'537','538','539','540','541','542','543','544','545','546','547','548','549','550',
'551','552','553','554','555','556','557','558','559','560','561','562','563','564',
'565','566','567','568','569','570','571','572','573','574','575','576','577','578',
'579','580','581','582','583','584','585','586','587','588','589','590','591','592',
'593','594','595','596','597','598','599','702','721','740','953','955'])]
Any suggestions?
thanks.
negate the boolean condition using ~
to invert the mask:
h1 = df1[~df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]
notin
and not in
, the former doesn't exist and the latter will likely raise a ValueError
or ambiguous value error as you're trying to use in
with an array and pandas does not work like that.
For the second question you need to compound your boolean conditions like so:
h1 = df1[(df1['nat_actn_2_3'].isin['100','102']) | ((df1['nat_acctn_2_3'] > 500) & (df1['nat_actn_2_3'] < 599))]
So I'm assuming from your text you want rows that are either equal to 100/102 or between 500 and 599 (unclear if you're including those values but you can just change to >=
and <=
respectively).
Here you use the bitwise operators &
and |
for and
and or
respectively, also you need to wrap ()
around each condition due to operator precedence
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With