Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas not in, in and between

Tags:

pandas

numpy

pd.version '0.14.0'

I need to do a not in statement for a column in a dataframe.

for the isin statement I use the following to filter for codes that I need:

h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]

I want to do a not in or not equal to (not sure which one is used for python) statement for another column.

So I tried the following:

h1 = df1[df1['csc_auth_12'].notin(['N6M','YEM','YEL','YEM'])]

h1 = df1[df1['csc_auth_12'] not in (['N6M','YEM','YEL','YEM'])]

and:

h1.query(['N6M','YEM','YEL','YEM'] not in ['csc_auth_12'])

I really want to filter out the N6M, YEM, YEL and YEM from the data set.

I'm also interested in how to do an between statement.

So for the following I had to manually type in all the 500 codes. I would like to do something like:

h1 = df1[df1['nat_actn_2_3'].isin['100','102'] and isbetween [500 & 599])]

but this is what I have:

h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104','107','108','112','115','117','120','122','124','128',
                             '130','132','132','140','141','142','143','145','146','147','148','149','170','171',
                             '172','173','179','190','198','199','501','502','503','504','505','506','507','508',
                             '509','510','511','512','513','514','515','516','517','518','519','520','521','522',
                             '523','524','525','526','527','528','529','530','531','532','533','534','535','536',
                             '537','538','539','540','541','542','543','544','545','546','547','548','549','550',
                             '551','552','553','554','555','556','557','558','559','560','561','562','563','564',
                             '565','566','567','568','569','570','571','572','573','574','575','576','577','578',
                             '579','580','581','582','583','584','585','586','587','588','589','590','591','592',
                             '593','594','595','596','597','598','599','702','721','740','953','955'])]

Any suggestions?

thanks.

like image 674
Dave Avatar asked Oct 06 '15 21:10

Dave


1 Answers

negate the boolean condition using ~ to invert the mask:

h1 = df1[~df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]

notin and not in, the former doesn't exist and the latter will likely raise a ValueError or ambiguous value error as you're trying to use in with an array and pandas does not work like that.

For the second question you need to compound your boolean conditions like so:

h1 = df1[(df1['nat_actn_2_3'].isin['100','102']) | ((df1['nat_acctn_2_3'] > 500) & (df1['nat_actn_2_3'] < 599))]

So I'm assuming from your text you want rows that are either equal to 100/102 or between 500 and 599 (unclear if you're including those values but you can just change to >= and <= respectively).

Here you use the bitwise operators & and | for and and or respectively, also you need to wrap () around each condition due to operator precedence

like image 184
EdChum Avatar answered Sep 19 '22 04:09

EdChum