Keep rows in data frame that, for all combinations of the values of certain columns, contain the same elements in another column

df = pd.DataFrame({'a':['x','x','x','x','x','y','y','y','y','y'],'b':['z','z','z','w','w','z','z','w','w','w'],'c':['c1','c2','c3','c1','c3','c1','c3','c1','c2','c3'],'d':range(1,11)})

   a  b   c   d
0  x  z  c1   1
1  x  z  c2   2
2  x  z  c3   3
3  x  w  c1   4
4  x  w  c3   5
5  y  z  c1   6
6  y  z  c3   7
7  y  w  c1   8
8  y  w  c2   9
9  y  w  c3  10

how can I keep only the rows that, for all combinations of a and b, contain the same values in c? Or in other words, how to exclude rows with c values that are only present in some combinations of a and b?

For example, only c1 and c3 are present in all combinations of a and b ([x,z],[x,w],[y,z],[y,w]), so the output would be

   a  b   c   d
0  x  z  c1   1
2  x  z  c3   3
3  x  w  c1   4
4  x  w  c3   5
5  y  z  c1   6
6  y  z  c3   7
7  y  w  c1   8
9  y  w  c3  10

How do I get unique column combinations in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions.

Can we create a data frame having any number of rows and columns?

Yes it is possible to create any shape dataframe.

Which method is used to add the rows at the end of the data frame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.

How do I get unique values in multiple columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

Here is one way. Get unique lists per group and then check common elements across all the returned arrays using reduce and np.intersect1d. Then filter the dataframe using series.isin and boolean indexing

from functools import reduce
out = df[df['c'].isin(reduce(np.intersect1d,df.groupby(['a','b'])['c'].unique()))]

Breakdown:

s = df.groupby(['a','b'])['c'].unique()
common_elements = reduce(np.intersect1d,s)
#Returns :-> array(['c1', 'c3'], dtype=object)

out = df[df['c'].isin(common_elements )]#.copy()

   a  b   c   d
0  x  z  c1   1
2  x  z  c3   3
3  x  w  c1   4
4  x  w  c3   5
5  y  z  c1   6
6  y  z  c3   7
7  y  w  c1   8
9  y  w  c3  10

Lets try groupby with nunique to count of unique elements per column c group:

s = df['a'] + ',' + df['b'] # combination of a, b
m = s.groupby(df['c']).transform('nunique').eq(s.nunique())

df[m]

   a  b   c   d
0  x  z  c1   1
2  x  z  c3   3
3  x  w  c1   4
4  x  w  c3   5
5  y  z  c1   6
6  y  z  c3   7
7  y  w  c1   8
9  y  w  c3  10

Keep rows in data frame that, for all combinations of the values of certain columns, contain the same elements in another column

Tags:

HappyPy

People also ask

2 Answers

anky

Shubham Sharma

Recent Activity

Donate For Us

Keep rows in data frame that, for all combinations of the values of certain columns, contain the same elements in another column

Tags:

HappyPy

People also ask

2 Answers

anky

Shubham Sharma

Related questions

Recent Activity

Donate For Us