I have this regex in Javascript to remove words with 3 letters or less:
srcText = srcText.replace(/\s[a-z]{1,3}\s/gi,'');
It works but when two consecutives matches are found, the 2nd isn't affected:
Ex.:
"... this is one sample of a text ... "
' one ' and ' a ' won't be affected unless I run the code one more time:
srcText = srcText.replace(/\s[a-z]{1,3}\s/gi,'');
So I'd have to run the code n times, n being the consecutives matches in srcText.
for testing purpose:
http://regexpal.com/
sample text:
http://www.gutenberg.org/files/521/521-0.txt (say, 4th paragraph)
Is my regex missing something or javascript won't allow this kind of recursiveness?
JavaScript's regular expressions (and most others too) support the \b escape sequence, which matches (zero-width) word boundaries. In your expression, simply replace the two \s with \b and it will work.
Note that "word boundary" also applies around dashes, dots, etc. So this-test - more. will have boundaries at: |this|-|test| - |more|. Usually this is desirable, but it is a difference in behaviour from \s which is worth knowing about.
As noted by Sam in the comments, a word boundary is identified as:
(^\w|\w\W|\W\w|\w$)
that is, a non-word character followed by a word character, or a word character followed by a non-word character, where the start and end of the string are taken as non-word characters. (but note that \b is zero-width, so it isn't just a shorthand for that expression)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With