Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep common rows within every group of a pandas dataframe

Tags:

python

pandas

Given the following pandas data frame:

  | a  b
--+-----
0 | 1  A 
1 | 2  A 
2 | 3  A 
3 | 4  A 
4 | 1  B 
5 | 2  B 
6 | 3  B 
7 | 1  C 
8 | 3  C 
9 | 4  C 

If you group it by column b I want to perform an action that keeps only the rows where they have column a in common. The result would be the following data frame:

  | a  b
--+-----
0 | 1  A 
2 | 3  A 
4 | 1  B 
6 | 3  B 
7 | 1  C 
8 | 3  C 

Is there some built in method to do this?

like image 698
rindis Avatar asked Dec 03 '22 09:12

rindis


2 Answers

You can try pivot_table with dropna here then filter using sreries.isin :

s = df.pivot_table(index='a',columns='b',aggfunc=len).dropna().index
df[df['a'].isin(s)]

Similarly with crosstab:

s = pd.crosstab(df['a'],df['b'])
df[df['a'].isin(s[s.all(axis=1)].index)]

   a  b
0  1  A
2  3  A
4  1  B
6  3  B
7  1  C
8  3  C
like image 97
anky Avatar answered Dec 15 '22 00:12

anky


Try:

df2=df.groupby("a")["b"].nunique().eq(len(df.groupby("b").groups))
df=df.merge(df2.loc[df2].rename("filter"), on="a").drop("filter", axis=1)

Output:

   a  b
0  1  A
1  1  B
2  1  C
3  3  A
4  3  B
5  3  C
like image 38
Grzegorz Skibinski Avatar answered Dec 14 '22 23:12

Grzegorz Skibinski