Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most common words in string

I am new to Ruby and trying to write a method that will return an array of the most common word(s) in a string. If there is one word with a high count, that word should be returned. If there are two words tied for the high count, both should be returned in an array.

The problem is that when I pass through the 2nd string, the code only counts "words" twice instead of three times. When the 3rd string is passed through, it returns "it" with a count of 2, which makes no sense, as "it" should have a count of 1.

def most_common(string)
  counts = {}
  words = string.downcase.tr(",.?!",'').split(' ')

  words.uniq.each do |word|
    counts[word] = 0
  end

  words.each do |word|
    counts[word] = string.scan(word).count
  end

  max_quantity = counts.values.max
  max_words = counts.select { |k, v| v == max_quantity }.keys
  puts max_words
end

most_common('a short list of words with some words') #['words']
most_common('Words in a short, short words, lists of words!') #['words']
most_common('a short list of words with some short words in it') #['words', 'short']
like image 219
Daniel Bonnell Avatar asked Dec 19 '22 10:12

Daniel Bonnell


1 Answers

Your method of counting instances of the word is your problem. it is in with, so it's double counted.

[1] pry(main)> 'with some words in it'.scan('it')
=> ["it", "it"]

It can be done easier though, you can group an array's contents by the number of instances of the values using an each_with_object call, like so:

counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }

This goes through each entry in the array and adds 1 to the value for each word's entry in the hash.

So the following should work for you:

def most_common(string)
  words = string.downcase.tr(",.?!",'').split(' ')
  counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
  max_quantity = counts.values.max
  counts.select { |k, v| v == max_quantity }.keys
end

p most_common('a short list of words with some words') #['words']
p most_common('Words in a short, short words, lists of words!') #['words']
p most_common('a short list of words with some short words in it') #['words', 'short']
like image 108
Nick Veys Avatar answered Dec 29 '22 20:12

Nick Veys