I have an array of email addresses (roughly over 50,000) and I am interested in counting the frequency of particular email domains. For example, if I had
emails = [
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]'
]
and I am interested in which email domain appears the most, I would want to return 'gmail' with frequency 2.
To do this, I thought it would be a good idea to go through the array and discard everything occurring before the @ and just keep the domains as a new array, which I could then iterate over. How would I do this?
Assuming your emails are string, you can do something like this:
emails = ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"]
counts = Hash.new(0)
emails.each { |t| counts[t.partition("@").last] += 1}
counts #{"gmail.com"=>2, "yahoo.com"=>1, "aol.com"=>1, "someuni.xyz.com"=>1}
Similar to mudasobwa's answer.
emails
.group_by{|s| s.partition("@").last}
.map{|k, v| [k, v.length]}
.max_by(&:last)
# => ["gmail.com", 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With