Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby, Count syllables

Tags:

ruby

nlp

I am using ruby to calculate the Gunning Fog Index of some content that I have, I can successfully implement the algorithm described here:

Gunning Fog Index

I am using the below method to count the number of syllables in each word:

Tokenizer = /([aeiouy]{1,3})/

def count_syllables(word)

  len = 0

  if word[-3..-1] == 'ing' then
    len += 1
    word = word[0...-3]
  end

  got = word.scan(Tokenizer)
  len += got.size()

  if got.size() > 1 and got[-1] == ['e'] and
      word[-1].chr() == 'e' and
      word[-2].chr() != 'l' then
    len -= 1
  end

  return len

end

It sometimes picks up words with only 2 syllables as having 3 syllables. Can anyone give any advice or is aware of a better method?

text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

word_array = text.split(' ')

word_array.each do |word|
    puts word if count_syllables(word) > 2
end

"themselves" is being counted as 3 but it's only 2

like image 253
RailsSon Avatar asked Aug 13 '09 13:08

RailsSon


People also ask

How many syllables are in Ruby?

Wondering why ruby is 2 syllables? Contact Us!

How many syllables is incorrect?

Wondering why incorrect is 3 syllables?

How many syllables are in algorithm?

Wondering why algorithm is 3 syllables? Contact Us!


1 Answers

The function I give you before is based upon these simple rules outlined here:

Each vowel (a, e, i, o, u, y) in a word counts as one syllable subject to the following sub-rules:

  • Ignore final -ES, -ED, -E (except for -LE)
  • Words of three letters or less count as one syllable
  • Consecutive vowels count as one syllable.

Here's the code:

def new_count(word)
  word.downcase!
  return 1 if word.length <= 3
  word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word.sub!(/^y/, '')
  word.scan(/[aeiouy]{1,2}/).size
end

Obviously, this isn't perfect either, but all you'll ever get with something like this is a heuristic.

EDIT:

I changed the code slightly to handle a leading 'y' and fixed the regex to handle 'les' endings better (such as in "candles").

Here's a comparison using the text in the question:

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

words = text.split(' ')

words.each do |word|
  old = count_syllables(word.dup)
  new = new_count(word.dup)
  puts "#{word}: \t#{old}\t#{new}" if old != new
end

The output is:

logorrhoea:     3   4
used:   2   1
makes:  2   1
themselves:     3   2

So it appears to be an improvement.

like image 78
Pesto Avatar answered Sep 17 '22 13:09

Pesto