Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an optimal way to get all combinations of values in a grouped pandas dataframe?

Sample dataframe as below

df = pd.DataFrame({'ID': ['a', 'a', 'a', 'b', 'b', 'c', 'c'], 
                   'color': ['red', 'blue', 'green', 'red', 'blue', 'red', 'green']})

I want 2 columns with all combinations of the color field after grouping by ID.
I want the resultant dataframe as shown below

ID color1 color2
a red blue
a red green
a blue red
a blue green
a green red
a green blue
b red blue
b blue red
c red green
c green red

I have tried using itertools.permutations but am looking for something more direct or for a solution that utilizes Pandas more.

like image 658
LeCoconutWhisperer Avatar asked Jan 21 '21 19:01

LeCoconutWhisperer


1 Answers

I think you can do a self merge and query:

df.merge(df, on='ID', suffixes=[1,2]).query('color1 != color2')

Or similar, merge then filter:

(df.merge(df, on='ID', suffixes=[1,2])
   .loc[lambda x: x['color1'] != x['color2']]
)

Output:

   ID color1 color2
1   a    red   blue
2   a    red  green
3   a   blue    red
5   a   blue  green
6   a  green    red
7   a  green   blue
10  b    red   blue
11  b   blue    red
14  c    red  green
15  c  green    red
like image 65
Quang Hoang Avatar answered Sep 19 '22 02:09

Quang Hoang