Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx consecutive matches

I have this regex in Javascript to remove words with 3 letters or less:

srcText = srcText.replace(/\s[a-z]{1,3}\s/gi,'');

It works but when two consecutives matches are found, the 2nd isn't affected:

Ex.:

"... this is one sample of a text ... "

' one ' and ' a ' won't be affected unless I run the code one more time:

srcText = srcText.replace(/\s[a-z]{1,3}\s/gi,'');

So I'd have to run the code n times, n being the consecutives matches in srcText.

for testing purpose:

http://regexpal.com/

sample text:

http://www.gutenberg.org/files/521/521-0.txt (say, 4th paragraph)

Is my regex missing something or javascript won't allow this kind of recursiveness?

like image 885
Azevedo Avatar asked Jun 11 '26 01:06

Azevedo


1 Answers

JavaScript's regular expressions (and most others too) support the \b escape sequence, which matches (zero-width) word boundaries. In your expression, simply replace the two \s with \b and it will work.

Note that "word boundary" also applies around dashes, dots, etc. So this-test - more. will have boundaries at: |this|-|test| - |more|. Usually this is desirable, but it is a difference in behaviour from \s which is worth knowing about.

As noted by Sam in the comments, a word boundary is identified as:

(^\w|\w\W|\W\w|\w$)

that is, a non-word character followed by a word character, or a word character followed by a non-word character, where the start and end of the string are taken as non-word characters. (but note that \b is zero-width, so it isn't just a shorthand for that expression)

like image 75
Dave Avatar answered Jun 12 '26 16:06

Dave



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!