RegEx: Understanding Syllable Counter Code

Question

I have used Dylan's question on here regarding JavaScript syllable counting, and more specifically artfulhacker's answer, in my own code and, regardless of which single or multi word string I feed it, the function is always able to correctly count the number of syllables.

I have a limited experience with RegEx and not enough prior knowledge to decipher what exactly is happening in the following code without some help. I'm not someone who is ever happy with having some code I pulled from somewhere just work without me knowing how it works. Is someone able to please articulate what is happening in the new_count(word) function below and help me decipher the use of RegEx and how it is that the function is able to correctly count syllables? Many

function new_count(word) {
  word = word.toLowerCase();                                     //word.downcase!
  if(word.length <= 3) { return 1; }                             //return 1 if word.length <= 3
  word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');   //word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word = word.replace(/^y/, '');                                 //word.sub!(/^y/, '')
  return word.match(/[aeiouy]{1,2}/g).length;                    //word.scan(/[aeiouy]{1,2}/).size
}

Attilio · Accepted Answer

As far as I see it, we basically want to count the vowels, or vowel pairs, with some special cases. Let's start by the last line, which does that, i.e. count vowels and pairs:

return word.match(/[aeiouy]{1,2}/g).length;

This will match any vowel, or vowel pair. [...] means a character class, i.e. that if we go through the string character-by-character, we have a match, if the actual character is one of those. {1, 2} is the number of repetitions, i.e. it means that we should match exactly one or two such characters.

The other two lines are for special cases.

word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');

This line will remove 'syllables' from the end of the word, which are either:

Xes (where X is anything but any of 'laeiouy', e.g. 'zes')
ed
Xe (where X is anything but any of 'laeiouy', e.g. 'xe')

(I'm not really sure what the grammatical meaning behind this is, but I guess, that 'syllables' at the end of the word, like '-ed', '-ded', '-xed' etc. don't really count as such.) As for the regexp part: (?:...) is a non-capturing group. I guess it's not really important in this case that this group be non-capturing; this just means that we would like to group the whole expression, but then we do not need to refer back to it. However, we could have used a capturing group too (i.e. (...) )

The [^...] is a negated character class. It means, match any character, which is none of those listed here. (Compare to the (non-negated) character-class mentioned above.) The pipe symbol, i.e. |, is the alternation operator, which means, that any of the expressions can match. Finally, the $ anchor matches the end of the line, or string (depending on the context).

word = word.replace(/^y/, '');

This line removes 'y'-s from the beginning of words (probably 'y' at the beginning does not count as a syllable -- which makes sense in my opinion). ^ is the anchor for matching the beginning of the line, or string (c.f. $ mentioned above).

Note: the algorithm only works if word really contains one single word.

RegEx: Understanding Syllable Counter Code

Tags:

javascript

regex

encryption

J Bloom

1 Answers

Attilio

Recent Activity

Donate For Us

RegEx: Understanding Syllable Counter Code

Tags:

javascript

regex

encryption

J Bloom

1 Answers

Attilio

Related questions

Recent Activity

Donate For Us