Detecting syllables in a word

People also ask

How do you identify syllables in a word?

To use it, say the word and clap your hands together each time you hear a vowel sound. For example, take the word "autumn": au-tumn. That's two vowel sounds, so it's two syllables even though autumn has three vowel letters: a, u and u. How many syllables did you get for each word?

What is syllable of detect?

Wondering why detect is 2 syllables? Contact Us!

Does Microsoft word have a syllable counter?

The syllable count in word is not displayed directly, but with the Flesch Reading Ease test we can determine the average syllables used in the document.

Read about the TeX approach to this problem for the purposes of hyphenation. Especially see Frank Liang's thesis dissertation Word Hy-phen-a-tion by Com-put-er. His algorithm is very accurate, and then includes a small exceptions dictionary for cases where the algorithm does not work.

I stumbled across this page looking for the same thing, and found a few implementations of the Liang paper here: https://github.com/mnater/hyphenator or the successor: https://github.com/mnater/Hyphenopoly

That is unless you're the type that enjoys reading a 60 page thesis instead of adapting freely available code for non-unique problem. :)

Here is a solution using NLTK:

from nltk.corpus import cmudict
d = cmudict.dict()
def nsyl(word):
  return [len(list(y for y in x if y[-1].isdigit())) for x in d[word.lower()]]

I'm trying to tackle this problem for a program that will calculate the flesch-kincaid and flesch reading score of a block of text. My algorithm uses what I found on this website: http://www.howmanysyllables.com/howtocountsyllables.html and it gets reasonably close. It still has trouble on complicated words like invisible and hyphenation, but I've found it gets in the ballpark for my purposes.

It has the upside of being easy to implement. I found the "es" can be either syllabic or not. It's a gamble, but I decided to remove the es in my algorithm.

private int CountSyllables(string word)
    {
        char[] vowels = { 'a', 'e', 'i', 'o', 'u', 'y' };
        string currentWord = word;
        int numVowels = 0;
        bool lastWasVowel = false;
        foreach (char wc in currentWord)
        {
            bool foundVowel = false;
            foreach (char v in vowels)
            {
                //don't count diphthongs
                if (v == wc && lastWasVowel)
                {
                    foundVowel = true;
                    lastWasVowel = true;
                    break;
                }
                else if (v == wc && !lastWasVowel)
                {
                    numVowels++;
                    foundVowel = true;
                    lastWasVowel = true;
                    break;
                }
            }

            //if full cycle and no vowel found, set lastWasVowel to false;
            if (!foundVowel)
                lastWasVowel = false;
        }
        //remove es, it's _usually? silent
        if (currentWord.Length > 2 && 
            currentWord.Substring(currentWord.Length - 2) == "es")
            numVowels--;
        // remove silent e
        else if (currentWord.Length > 1 &&
            currentWord.Substring(currentWord.Length - 1) == "e")
            numVowels--;

        return numVowels;
    }

This is a particularly difficult problem which is not completely solved by the LaTeX hyphenation algorithm. A good summary of some available methods and the challenges involved can be found in the paper Evaluating Automatic Syllabification Algorithms for English (Marchand, Adsett, and Damper 2007).

Why calculate it? Every online dictionary has this info. http://dictionary.reference.com/browse/invisible in·vis·i·ble

Bumping @Tihamer and @joe-basirico. Very useful function, not perfect, but good for most small-to-medium projects. Joe, I have re-written an implementation of your code in Python:

def countSyllables(word):
    vowels = "aeiouy"
    numVowels = 0
    lastWasVowel = False
    for wc in word:
        foundVowel = False
        for v in vowels:
            if v == wc:
                if not lastWasVowel: numVowels+=1   #don't count diphthongs
                foundVowel = lastWasVowel = True
                        break
        if not foundVowel:  #If full cycle and no vowel found, set lastWasVowel to false
            lastWasVowel = False
    if len(word) > 2 and word[-2:] == "es": #Remove es - it's "usually" silent (?)
        numVowels-=1
    elif len(word) > 1 and word[-1:] == "e":    #remove silent e
        numVowels-=1
    return numVowels

Hope someone finds this useful!

Related questions
                            
                                Ordinal numbers replacement
                            
                                Stopword removal with NLTK
                            
                                Calculate cosine similarity given 2 sentence strings
                            
                                Creating a new corpus with NLTK
                            
                                Sentiment analysis for Twitter in Python [closed]
                            
                                Is there a good natural language processing library [closed]
                            
                                How to config nltk data directory from code?
                            
                                How to train the Stanford Parser with Genia Corpus?
                            
                                How to use Stanford Parser in NLTK using Python
                            
                                What does Keras Tokenizer method exactly do?
                            
                                How can I correctly prefix a word with "a" and "an"?
                            
                                Understanding min_df and max_df in scikit CountVectorizer
                            
                                word2vec: negative sampling (in layman term)?
                            
                                How do I do word Stemming or Lemmatization?
                            
                                How do you implement a "Did you mean"? [duplicate]
                            
                                Java or Python for Natural Language Processing [closed]
                            
                                Difference between constituency parser and dependency parser
                            
                                How does Apple find dates, times and addresses in emails?
                            
                                How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?
                            
                                How to get rid of punctuation using NLTK tokenizer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detecting syllables in a word

Tags:

nlp

spell-checking

hyphenation

People also ask

Recent Activity

Donate For Us