Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get words frequency in efficient way with ruby?

Tags:

regex

ruby

Sample input:

"I was 09809 home -- Yes! yes!  You was"

and output:

{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }

My code that does not work:

def get_words_f(myStr)
    myStr=myStr.downcase.scan(/\w/).to_s;
    h = Hash.new(0)
    myStr.split.each do |w|
       h[w] += 1 
    end
    return h.to_a;
end

print get_words_f('I was 09809 home -- Yes! yes!  You was');
like image 466
Ben Avatar asked Mar 12 '12 21:03

Ben


2 Answers

def count_words(string)
  string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end

Second variant:

def count_words(string)
  string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end
like image 80
megas Avatar answered Nov 14 '22 03:11

megas


This works but I am kinda new to Ruby too. There might be a better solution.

def count_words(string)
  words = string.split(' ')
  frequency = Hash.new(0)
  words.each { |word| frequency[word.downcase] += 1 }
  return frequency
end

Instead of .split(' '), you could also do .scan(/\w+/); however, .scan(/\w+/) would separate aren and t in "aren't", while .split(' ') won't.

Output of your example code:

print count_words('I was 09809 home -- Yes! yes!  You was');

#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}
like image 21
emre nevayeshirazi Avatar answered Nov 14 '22 02:11

emre nevayeshirazi