df = pd.DataFrame({'a':['x','x','x','x','x','y','y','y','y','y'],'b':['z','z','z','w','w','z','z','w','w','w'],'c':['c1','c2','c3','c1','c3','c1','c3','c1','c2','c3'],'d':range(1,11)})
a b c d
0 x z c1 1
1 x z c2 2
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
8 y w c2 9
9 y w c3 10
how can I keep only the rows that, for all combinations of a
and b
, contain the same values in c
? Or in other words, how to exclude rows with c
values that are only present in some combinations of a
and b
?
For example, only c1
and c3
are present in all combinations of a
and b
([x,z]
,[x,w]
,[y,z]
,[y,w]
), so the output would be
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions.
Yes it is possible to create any shape dataframe.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
Here is one way. Get unique lists per group and then check common elements across all the returned arrays using reduce
and np.intersect1d
. Then filter the dataframe using series.isin
and boolean indexing
from functools import reduce
out = df[df['c'].isin(reduce(np.intersect1d,df.groupby(['a','b'])['c'].unique()))]
Breakdown:
s = df.groupby(['a','b'])['c'].unique()
common_elements = reduce(np.intersect1d,s)
#Returns :-> array(['c1', 'c3'], dtype=object)
out = df[df['c'].isin(common_elements )]#.copy()
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10
Lets try groupby
with nunique
to count of unique elements per column c
group:
s = df['a'] + ',' + df['b'] # combination of a, b
m = s.groupby(df['c']).transform('nunique').eq(s.nunique())
df[m]
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With