Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Filter function returned a Series, but expected a scalar bool

I am attempting to use filter on a pandas dataframe to filter out all rows that match a duplicate value(need to remove ALL the rows when there are duplicates, not just the first or last).

This is what I have that works in the editor :

df = df.groupby("student_id").filter(lambda x: x.count() == 1)

But when I run my script with this code in it I get the error:

TypeError: filter function returned a Series, but expected a scalar bool

I am creating the dataframe by concatenating two other frames immediately before trying to apply the filter.

like image 331
lathomas64 Avatar asked Nov 20 '14 17:11

lathomas64


1 Answers

it should be:

In [32]: grouped = df.groupby("student_id")

In [33]: grouped.filter(lambda x: x["student_id"].count()==1)

Updates:

i'm not sure about the issue u mentioned regarding the interactive console. technically speaking in this particular case (there might be other situations such as the intricate "import" functionality in which diff env may behave differently), the console (such as ipython) should behave the same as other environment (orig python env, or some IDE embedded one)

an intuitive way to understand the pandas groupby is to treat the return obj of DataFrame.groupby() as a list of dataframe. so when u try to using filter to apply the lambda function upon x, x is actually one of those dataframes:

In[25]: df = pd.DataFrame(data,columns=year)

In[26]: df

Out[26]: 
   2013  2014
0     0     1
1     2     3
2     4     5
3     6     7
4     0     1
5     2     3
6     4     5
7     6     7

In[27]: grouped = df.groupby(2013)

In[28]: grouped.count()

Out[28]: 
      2014
2013      
0        2
2        2
4        2
6        2

in this example, the first dataframe in the grouped obj would be:

In[33]: df1 = df.ix[[0,4]]

In[34]: df1

Out[33]: 
   2013  2014
0     0     1
4     0     1
like image 148
leo Avatar answered Nov 15 '22 21:11

leo