To use it, say the word and clap your hands together each time you hear a vowel sound. For example, take the word "autumn": au-tumn. That's two vowel sounds, so it's two syllables even though autumn has three vowel letters: a, u and u. How many syllables did you get for each word?
Wondering why detect is 2 syllables? Contact Us!
The syllable count in word is not displayed directly, but with the Flesch Reading Ease test we can determine the average syllables used in the document.
Read about the TeX approach to this problem for the purposes of hyphenation. Especially see Frank Liang's thesis dissertation Word Hy-phen-a-tion by Com-put-er. His algorithm is very accurate, and then includes a small exceptions dictionary for cases where the algorithm does not work.
I stumbled across this page looking for the same thing, and found a few implementations of the Liang paper here: https://github.com/mnater/hyphenator or the successor: https://github.com/mnater/Hyphenopoly
That is unless you're the type that enjoys reading a 60 page thesis instead of adapting freely available code for non-unique problem. :)
Here is a solution using NLTK:
from nltk.corpus import cmudict
d = cmudict.dict()
def nsyl(word):
return [len(list(y for y in x if y[-1].isdigit())) for x in d[word.lower()]]
I'm trying to tackle this problem for a program that will calculate the flesch-kincaid and flesch reading score of a block of text. My algorithm uses what I found on this website: http://www.howmanysyllables.com/howtocountsyllables.html and it gets reasonably close. It still has trouble on complicated words like invisible and hyphenation, but I've found it gets in the ballpark for my purposes.
It has the upside of being easy to implement. I found the "es" can be either syllabic or not. It's a gamble, but I decided to remove the es in my algorithm.
private int CountSyllables(string word)
{
char[] vowels = { 'a', 'e', 'i', 'o', 'u', 'y' };
string currentWord = word;
int numVowels = 0;
bool lastWasVowel = false;
foreach (char wc in currentWord)
{
bool foundVowel = false;
foreach (char v in vowels)
{
//don't count diphthongs
if (v == wc && lastWasVowel)
{
foundVowel = true;
lastWasVowel = true;
break;
}
else if (v == wc && !lastWasVowel)
{
numVowels++;
foundVowel = true;
lastWasVowel = true;
break;
}
}
//if full cycle and no vowel found, set lastWasVowel to false;
if (!foundVowel)
lastWasVowel = false;
}
//remove es, it's _usually? silent
if (currentWord.Length > 2 &&
currentWord.Substring(currentWord.Length - 2) == "es")
numVowels--;
// remove silent e
else if (currentWord.Length > 1 &&
currentWord.Substring(currentWord.Length - 1) == "e")
numVowels--;
return numVowels;
}
This is a particularly difficult problem which is not completely solved by the LaTeX hyphenation algorithm. A good summary of some available methods and the challenges involved can be found in the paper Evaluating Automatic Syllabification Algorithms for English (Marchand, Adsett, and Damper 2007).
Why calculate it? Every online dictionary has this info. http://dictionary.reference.com/browse/invisible in·vis·i·ble
Bumping @Tihamer and @joe-basirico. Very useful function, not perfect, but good for most small-to-medium projects. Joe, I have re-written an implementation of your code in Python:
def countSyllables(word):
vowels = "aeiouy"
numVowels = 0
lastWasVowel = False
for wc in word:
foundVowel = False
for v in vowels:
if v == wc:
if not lastWasVowel: numVowels+=1 #don't count diphthongs
foundVowel = lastWasVowel = True
break
if not foundVowel: #If full cycle and no vowel found, set lastWasVowel to false
lastWasVowel = False
if len(word) > 2 and word[-2:] == "es": #Remove es - it's "usually" silent (?)
numVowels-=1
elif len(word) > 1 and word[-1:] == "e": #remove silent e
numVowels-=1
return numVowels
Hope someone finds this useful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With