Number of occurrence of pair of value in dataframe

Question

I have dataframe with following columns:

Name, Surname, dateOfBirth, city, country

I am interested to find what is most common combination of name and surname and how much it occurs as well. Would be nice also to see list of top 10 combinations.

My idea for top one was:

mostFreqComb= df.groupby(['Name','Surname'])['Name'].count().argmax()

But I think it is not giving me correct answer. Help would be much appreciated !

Thanks, Neb

cs95 · Accepted Answer

Seems like a good use case for the performant Counter:

from collections import Counter
popular_names = Counter(zip(df.Name, df.Surname)).most_common(10)

jpp · Answer

For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.

`GroupBy.size`

You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size:

res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)

By sorting these values, we can easily extract the most common:

most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist()  # handles duplicate top counts

`value_counts`

Another way is to construct a series of tuples, then apply pd.Series.value_counts:

res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()

The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.

name, surname = res.index[0]  # return most common
most_common_dups = res[res == res.max()].index.tolist()

`collections.Counter`

If you wish to create a dictionary of (name, surname): counts entries, you can do so via collections.Counter:

from collections import Counter

zipper = zip(df.Name, df.Surname)
c = Counter(zipper)

Counter has useful methods such as most_common, which you can use to extract your result.

Number of occurrence of pair of value in dataframe

Tags:

python

pandas

series

Nebojsa V NLPTrader

Video Answer

2 Answers

cs95

`GroupBy.size`

`value_counts`

`collections.Counter`

jpp

Recent Activity

Donate For Us

Number of occurrence of pair of value in dataframe

Tags:

python

pandas

series

Nebojsa V NLPTrader

Video Answer

2 Answers

cs95

GroupBy.size

value_counts

collections.Counter

jpp

Related questions

Recent Activity

Donate For Us

`GroupBy.size`

`value_counts`

`collections.Counter`