Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting Syllables In A Word

I'm looking for a fully accurate statement of an algorithm to count syllables in words. What I'm finding when I research is inconsistent or what I know to generate incorrect results. Does anyone have any suggestions of how to accomplish this? Thanks.

The algorithm I'm using now:

  1. Count the number of vowels in the word.
  2. Do not count double-vowels ("rain" has 2 vowels but is only 1 syllable)
  3. If last letter in word is vowel do not count ("side" is 1 syllable)

Are there any more rules I'm missing? I'm trying to determine in testing for my incorrect results if the algorithm I'm using is wrong or my implementation of it.

like image 539
Glenn1234 Avatar asked Feb 01 '12 12:02

Glenn1234


People also ask

Is it 1 or 2 syllables?

Wondering why it's is 1 syllable? Contact Us!


2 Answers

Ambiguity is a huge issue in natural language processing, but some tasks can actually handle with the ambiguity with nice accuracy. It turns out syllabification is one of them, so don't listen to the other answers. :)

Syllabification

Heuristic-based

You could come up with algorithms achieving correct syllabification virtually throughout the English vocabulary, but it seems complicated to program correctly.

Corpus-based

As always, when hand-made algorithms don't help too much, Natural Language Processing researchers use hand-tagged corpora containing the correct answers for given words. Learnings algorithms are then used and often provide great accuracy. You can use LingPipe's syllabification (see "English syllabification") which follows this approach.

Exhaustive list

English only has so many words, which is how we came up with dictionaries. Such dictionaries often contain the correct syllabification. You could scrape reference.com. For example, the undulate entry contains « un·du·late », which is enough to know there are three syllables.

Other such dictionaries include Answers.com, The Free Dictionary, Merriam-Webster, and so on. Do read the Terms and Conditions, automated retrieval may not be allowed. And different dictionaries don't always agree with each other.

It won't help with new words or proper nouns, but I'd say it's going to be the most accurate method.

About hyphenation

Another related problem got a lot more exposure: hyphenation. But don't use that! It is used in typesetting programs such as LaTeX, but only aims to provide some of the correct hyphens, without ever providing an incorrect one (high precision, low recall). It's interesting to note that there only are 14 exceptions, eg. project which has a different hyphenation depending on the part-of-speech (verb or noun).

Hyphenation programs

If you decide that it's enough for you needs, note that a few implementations of the TeX hyphenation algorithm exist in other languages, such as Python, Perl or Ruby.

like image 132
Quentin Pradet Avatar answered Sep 30 '22 12:09

Quentin Pradet


I'm looking for a fully accurate statement of an algorithm to count syllables in words

There isn't one. Period. Whatever algorithm you invent, I promise to find a counterexample. In certain languages(Armenian and Russian come to mind) the algorithm is pretty straightforward - count the number of vowels. In other languages, such as German, it's not as straightforward but still doable. In English, I am afraid, the transduction between letters and sounds is absolutely irregular.

For example,

coincidence. oi is to be counted as two syllables. But in boil it's only one syllable. Also, not counting the final vowel is not always accurate. Consider the name Penelope or Hermione. Or banana

Another curious case is when the syllable exists without a printed vowel. For example, table is a bisyllabic word but the second syllable is generated by the invisible sound between b and l. Also, don't forget about words originated from greek, which can have a lot of consecutive vowels. E.g. onomatopoeia.

So, there is no accurate algorithm. The only way you can go is to try to find an algorithm which works in many (I am avoiding the word most) cases. But in this case you should redefine your requirements.

like image 39
Armen Tsirunyan Avatar answered Sep 30 '22 10:09

Armen Tsirunyan