How generate all pairs of values, from the result of a groupby, in a pandas dataframe

Tags:

I have a pandas dataframe df:

ID     words
1      word1
1      word2
1      word3
2      word4
2      word5
3      word6
3      word7
3      word8
3      word9

I want to produce another dataframe that would generate all pairs of words in each group. So the result for the above would be:

ID     wordA    wordB
1      word1    word2
1      word1    word3
1      word2    word3
2      word4    word5
3      word6    word7
3      word6    word8
3      word6    word9
3      word7    word8
3      word7    word9
3      word8    word9

I know that I can used df.groupby['words'] to get the words within each ID.

I also know that I can use

iterable = ['word1','word2','word3']
list(itertools.combinations(iterable, 2))

to get all possible pairwise combinations. However, I'm a little lost as to the best way to generate a resulting dataframe as shown above.

636

asked Dec 03 '17 13:12

BKS

2 Answers

Its simple use itertools combinations inside apply and stack i.e

from itertools import combinations
ndf = df.groupby('ID')['words'].apply(lambda x : list(combinations(x.values,2)))
                          .apply(pd.Series).stack().reset_index(level=0,name='words')

 ID           words
0   1  (word1, word2)
1   1  (word1, word3)
2   1  (word2, word3)
0   2  (word4, word5)
0   3  (word6, word7)
1   3  (word6, word8)
2   3  (word6, word9)
3   3  (word7, word8)
4   3  (word7, word9)
5   3  (word8, word9)

To match you exact output further we have to do

sdf = pd.concat([ndf['ID'],ndf['words'].apply(pd.Series)],1).set_axis(['ID','WordsA','WordsB'],1,inplace=False)

   ID WordsA WordsB
0   1  word1  word2
1   1  word1  word3
2   1  word2  word3
0   2  word4  word5
0   3  word6  word7
1   3  word6  word8
2   3  word6  word9
3   3  word7  word8
4   3  word7  word9
5   3  word8  word9

To convert it to a one line we can do :

combo = df.groupby('ID')['words'].apply(combinations,2)\
                     .apply(list).apply(pd.Series)\
                     .stack().apply(pd.Series)\
                     .set_axis(['WordsA','WordsB'],1,inplace=False)\
                     .reset_index(level=0)

185

answered Sep 30 '22 19:09

Bharath

You can use groupby with apply and return DataFrame, last add reset_index for remove second level and then for create column from index:

from itertools import combinations

f = lambda x : pd.DataFrame(list(combinations(x.values,2)), 
                            columns=['wordA','wordB'])
df = (df.groupby('ID')['words'].apply(f)
                               .reset_index(level=1, drop=True)
                               .reset_index())
print (df)
   ID  wordA  wordB
0   1  word1  word2
1   1  word1  word3
2   1  word2  word3
3   2  word4  word5
4   3  word6  word7
5   3  word6  word8
6   3  word6  word9
7   3  word7  word8
8   3  word7  word9
9   3  word8  word9

answered Sep 30 '22 19:09

jezrael

Related questions
                            
                                What does the error: `Loaded runtime CuDNN library: 5005 but source was compiled with 5103` mean?
                            
                                How to detect a full black color image in OpenCV Python?
                            
                                Bootstrap with Flask
                            
                                push_back/emplace_back a shallow copy of an object into another vector
                            
                                How to convert a string into list with one element in python [duplicate]
                            
                                Add header to CSV without loading CSV
                            
                                Difference between class foo , class foo() and class foo(object)?
                            
                                Why are my gunicorn Python/Flask workers exiting from signal term?
                            
                                Python requests return 504 in localhost
                            
                                how to pip install 64 bit packages while having both 64 bit and 32 bit versions?
                            
                                How to pass a string to a post call, using python requests
                            
                                bins must increase monotonically
                            
                                Why does assert np.nan == np.nan cause an error?
                            
                                How can I create a partial search filter in Django REST framework?
                            
                                Python pandas cumsum with reset everytime there is a 0
                            
                                Normalization VS. numpy way to normalize?
                            
                                Pip install fails: SSL required
                            
                                How to insert zeros between elements in a numpy array?
                            
                                Python Statsmodels Mixedlm (Mixed Linear Model) random effects
                            
                                Python Pandas: Groupby Sum AND Concatenate Strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How generate all pairs of values, from the result of a groupby, in a pandas dataframe

Tags:

python

pandas

python-2.7

combinations

BKS

People also ask

2 Answers

Bharath

jezrael

Recent Activity

Donate For Us