Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering DataFrame on groups where count of element is different than 1

I'm working with a DataFrame having the following structure:

import pandas as pd

df = pd.DataFrame({'group':[1,1,1,2,2,2,2,3,3,3],
                   'brand':['A','B','X','C','D','X','X','E','F','X']})

print(df)

   group brand
0      1     A
1      1     B
2      1     X
3      2     C
4      2     D
5      2     X
6      2     X
7      3     E
8      3     F
9      3     X

My goal is to view only the groups having exactly one brand X associated to them. Since group number 2 has two observations equal to brand X, it should be filtered out from the resulting DataFrame.

The output should look like this:

   group brand
0      1     A
1      1     B
2      1     X
3      3     E
4      3     F
5      3     X

I know I should do a groupby on the group column and then filter those groups having a count of X different than 1. The filtering part is where I struggle. Any help would be appreciated.

like image 270
glpsx Avatar asked Jan 16 '20 11:01

glpsx


People also ask

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do you filter in Groupby?

GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.

How do you group data and count in pandas?

Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

How do I sort a DataFrame by a group?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values().


2 Answers

Use series.eq to check if brand is equal to X , then groupby and transform sum and filter groups in which X count is equal to 1:

df[df['brand'].eq('X').groupby(df['group']).transform('sum').eq(1)]

   group brand
0      1     A
1      1     B
2      1     X
7      3     E
8      3     F
9      3     X
like image 101
anky Avatar answered Oct 23 '22 19:10

anky


This should work as well

df[df.groupby(['group'])['brand'].transform('sum').str.count('X').eq(1)]

Output

 group  brand
0   1   A
1   1   B
2   1   X
7   3   E
8   3   F
9   3   X
like image 36
moys Avatar answered Oct 23 '22 20:10

moys