I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.
One solution I had found is as follows:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;
But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.
Another one I put together:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;
But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.
Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?
The RegExp test() Method in JavaScript is used to test for match in a string. If there is a match this method returns true else it returns false. Where str is the string to be searched. This is required field.
To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.
This should do what you're after:
value.match(/\S+/g).length;
Rather than splitting the string, you're matching on any sequence of non-whitespace characters.
There's the added bonus of being easily able to extract each word if needed ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With