I have dataframe with following columns:
Name, Surname, dateOfBirth, city, country
I am interested to find what is most common combination of name and surname and how much it occurs as well. Would be nice also to see list of top 10 combinations.
My idea for top one was:
mostFreqComb= df.groupby(['Name','Surname'])['Name'].count().argmax()
But I think it is not giving me correct answer. Help would be much appreciated !
Thanks, Neb
Seems like a good use case for the performant Counter
:
from collections import Counter
popular_names = Counter(zip(df.Name, df.Surname)).most_common(10)
For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.
GroupBy.size
You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size
:
res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)
By sorting these values, we can easily extract the most common:
most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist() # handles duplicate top counts
value_counts
Another way is to construct a series of tuples, then apply pd.Series.value_counts
:
res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()
The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.
name, surname = res.index[0] # return most common
most_common_dups = res[res == res.max()].index.tolist()
collections.Counter
If you wish to create a dictionary of (name, surname): counts
entries, you can do so via collections.Counter
:
from collections import Counter
zipper = zip(df.Name, df.Surname)
c = Counter(zipper)
Counter
has useful methods such as most_common
, which you can use to extract your result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With