Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the most frequent two-column combination in a dataframe in python

Tags:

python

pandas

I have my data in pandas data frame as follows:

df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5]})

So the dataframe looks like this:

   a  b
0  1  2
1  2  3
2  3  4
3  3  4
4  4  5
5  4  5
6  4  5

The column 'a','b' combination here are: 12(1), 23(1), 34(2), 45(3). I am trying to select 4 and 5 and print them out because their combination has most occurrences (3 times).

My code is:

counts = df.groupby(['a','b']).size().sort_values(ascending=False)
print(counts)

Output:

a  b
4  5    3
3  4    2
2  3    1
1  2    1
dtype: int64

But this only gives me a column [3,2,1,1]. This are the numbers combination counts. How can I access elements 4 and 5 individually so I can print them out?

Thanks in advance!

like image 267
Zimu Avatar asked Oct 29 '18 01:10

Zimu


Video Answer


1 Answers

Using idxmax, even the result is inorder, you still can find the index of max value

df.groupby(['a','b']).size().idxmax()
Out[15]: (4, 5)
like image 50
BENY Avatar answered Oct 18 '22 21:10

BENY