Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match '+abc' but not '++abc' without lookbehind?

In a sentence similar to:

Lorem ipsum +dolor ++sit amet.

I'd like to match the +dolor but not the ++sit. I can do it with a lookbehind but since JavaScript does not support it I'm struggling to build a pattern for it.

So far I've tried it with:

(?:\+(.+?))(?=[\s\.!\!]) - but it matches both words
(?:\+{1}(.+?))(?=[\s\.!\!]) - the same here - both words are matched

and to my surprise a pattern like:

(?=\s)(?:\+(.+?))(?=[\s\.!\!])

doesn't match anything. I thought I can trick it out and use the \s or later also the ^ before the + sign but it doesn't seem to work like that.


EDIT - background information:

It's not necessarily part of the question but sometimes it's good to know what is this all good for so to clarify some of your questions/comments a short explanation:

  • any word in any order can by marked by either a + or a ++
  • each word and it's marking will be replaced by a <span> later
  • cases like lorem+ipsum are concidered to be invalid because it would be like splitting a word (ro+om) or writing two words together as one word (myroom) so it has to be corrected anyway (the pattern can match this but it's not an error) it should however at least match the normal cases like in the example above
  • I use a lookahead like (?=[\s\.!\!]) so that I can match words in any language an not only \w's characters
like image 387
t3chb0t Avatar asked Jan 14 '15 11:01

t3chb0t


3 Answers

One way would be to match one additional character and ignore that (by putting the relevant part of the match into a capturing group):

(?:^|[^+])(\+[^\s+.!]+)

However, this breaks down if potential matches could be directly adjacent to each other.

Test it live on regex101.com.

Explanation:

(?:         # Match (but don't capture)
 ^          # the position at the start of the string
|           # or
 [^+]       # any character except +.
)           # End of group
(           # Match (and capture in group 1)
 \+         # a + character
 [^\s+.!]+  # one or more characters except [+.!] or whitespace.
)           # End of group
like image 109
Tim Pietzcker Avatar answered Nov 05 '22 12:11

Tim Pietzcker


\+\+|(\+\S+)

Grab the content from capturing group 1. The regex uses the trick described in this answer.

Demo on regex101

var re = /\+\+|(\+\S+)/g;
var str = 'Lorem ipsum +dolor ++sit ame';
var m;
var o = [];

while ((m = re.exec(str)) != null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }

    if (m[1] != null) {
        o.push(m[1]);
    }

}

If you have input like +++donor, use:

\+\++|(\+\S+)
like image 40
vks Avatar answered Nov 05 '22 12:11

vks


The following regex seems to be working for me:

var re = / (\+[a-zA-Z0-9]+)/  // Note the space after the '/'

Demo

https://www.regex101.com/r/uQ3wE7/1

like image 1
Vivendi Avatar answered Nov 05 '22 13:11

Vivendi