Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Select rows based on multiple object values inside a column

I have a pandas dataframe in which one of the column contains user information. Each record of this column is a list which in turn contains dictionaries of user information. Like the follwoing:

                                                USER                      id  
1  [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  634618   
2  [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  642054   
3  [{u'STATUS': u'ACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  631426    

I want to select only the rows where the STATUS is ACTV and the NAME is abc. How do I select rows where the data is nested. So in the above df only row 3 will be selected

like image 915
Fizi Avatar asked Jun 17 '26 04:06

Fizi


2 Answers

You could loop through the USER column with apply and then check if any of the dictionary satisfy the condition which makes a boolean series for subsetting:

df[df.USER.apply(lambda lst: any(d['NAME']=='abc' and d['STATUS']=='ACTV' for d in lst))]

#                                                USER      id
#3  [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':...  631426
like image 186
Psidom Avatar answered Jun 21 '26 02:06

Psidom


We can unpack your df.USER column into a pd.Panel and find the rows that way. Lots of overhead. Not worth it! But cool... maybe. I'll try again later.

pn = pd.Panel({k: pd.DataFrame(v) for k, v in df.USER.iteritems()})
cond1 = pn.loc[:, :, 'STATUS'] == 'ACTV'
cond2 = pn.loc[:, :, 'NAME'] == 'abc'

df.loc[pn.loc[(cond1 & cond2).any(), :, :].items]

                                                USER      id
2  [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':...  631426
like image 39
piRSquared Avatar answered Jun 21 '26 03:06

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!