Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How generate all pairs of values, from the result of a groupby, in a pandas dataframe

I have a pandas dataframe df:

ID     words
1      word1
1      word2
1      word3
2      word4
2      word5
3      word6
3      word7
3      word8
3      word9

I want to produce another dataframe that would generate all pairs of words in each group. So the result for the above would be:

ID     wordA    wordB
1      word1    word2
1      word1    word3
1      word2    word3
2      word4    word5
3      word6    word7
3      word6    word8
3      word6    word9
3      word7    word8
3      word7    word9
3      word8    word9

I know that I can used df.groupby['words'] to get the words within each ID.

I also know that I can use

iterable = ['word1','word2','word3']
list(itertools.combinations(iterable, 2))

to get all possible pairwise combinations. However, I'm a little lost as to the best way to generate a resulting dataframe as shown above.

like image 636
BKS Avatar asked Dec 03 '17 13:12

BKS


People also ask

How do I turn a Groupby into a list?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

How do you get the sum of Groupby in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

What does PD Groupby return?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

How do I count the number of rows in each group of a Groupby object?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.


2 Answers

Its simple use itertools combinations inside apply and stack i.e

from itertools import combinations
ndf = df.groupby('ID')['words'].apply(lambda x : list(combinations(x.values,2)))
                          .apply(pd.Series).stack().reset_index(level=0,name='words')

 ID           words
0   1  (word1, word2)
1   1  (word1, word3)
2   1  (word2, word3)
0   2  (word4, word5)
0   3  (word6, word7)
1   3  (word6, word8)
2   3  (word6, word9)
3   3  (word7, word8)
4   3  (word7, word9)
5   3  (word8, word9)

To match you exact output further we have to do

sdf = pd.concat([ndf['ID'],ndf['words'].apply(pd.Series)],1).set_axis(['ID','WordsA','WordsB'],1,inplace=False)

   ID WordsA WordsB
0   1  word1  word2
1   1  word1  word3
2   1  word2  word3
0   2  word4  word5
0   3  word6  word7
1   3  word6  word8
2   3  word6  word9
3   3  word7  word8
4   3  word7  word9
5   3  word8  word9

To convert it to a one line we can do :

combo = df.groupby('ID')['words'].apply(combinations,2)\
                     .apply(list).apply(pd.Series)\
                     .stack().apply(pd.Series)\
                     .set_axis(['WordsA','WordsB'],1,inplace=False)\
                     .reset_index(level=0)
like image 185
Bharath Avatar answered Sep 30 '22 19:09

Bharath


You can use groupby with apply and return DataFrame, last add reset_index for remove second level and then for create column from index:

from itertools import combinations

f = lambda x : pd.DataFrame(list(combinations(x.values,2)), 
                            columns=['wordA','wordB'])
df = (df.groupby('ID')['words'].apply(f)
                               .reset_index(level=1, drop=True)
                               .reset_index())
print (df)
   ID  wordA  wordB
0   1  word1  word2
1   1  word1  word3
2   1  word2  word3
3   2  word4  word5
4   3  word6  word7
5   3  word6  word8
6   3  word6  word9
7   3  word7  word8
8   3  word7  word9
9   3  word8  word9
like image 24
jezrael Avatar answered Sep 30 '22 19:09

jezrael