Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex expression and next symbol is not '('

Tags:

regex

I want to have an expression, where the next symbol after found value is not "(".

I have the following base regex:

(([_A-Za-z]([_\w])+)|([A-Za-z]))

and text for example:

a3+red+42+_dv+Sy(w12+44)

Wished regex should return:

a3, red, _dv, w12

this base regex returns

a3, red, _dv, Sy, w12

but I need exclude 'Sy', because next symbol is "(".

I try the following:

(([_A-Za-z]([_\w])+)|([A-Za-z]))(\b)

but it returns

a3+, red+, _dv+, w12)

I not need to have next symbol, I need include only if the next symbol is not "(".

like image 812
Oleg Sh Avatar asked Dec 03 '16 17:12

Oleg Sh


2 Answers

You need to do three things:

  • enclose the pattern in an atomic group (or at least the first part of your alternation that contains a quantifier)

  • start your pattern with a word boundary (to quickly avoid useless positions)

  • use a lookahead assertion to test the next character if any

result:

\b((?>[_A-Za-z]\w+)|[A-Za-z]\b)(?!\()

the point 1 is important to block the backtracking mechanism in this kind of situation: Abcd( Without it the pattern will succeed and return Abc. Using an atomic group, the pattern will match Abcd and since it can't to go back, it will fail with the next character.

Other way to write the pattern:

\b(?>[A-Za-z]\w*|_\w+)(?!\()
like image 53
Casimir et Hippolyte Avatar answered Oct 08 '22 16:10

Casimir et Hippolyte


If you don't expect a single _ in your input how about this regex:

\b[^\W\d]\w*+(?!\()
  • \b matches a word boundary
  • \w matches a word character
  • [^\W\d] matches [_a-zA-Z]
  • (?!\() if not followed by (

See demo at regex101

The + after * quantifier makes it possessive to prevent backtracking at the lookahead.
Instead you could use another word boundary \b (see another demo at regex101).

like image 43
bobble bubble Avatar answered Oct 08 '22 14:10

bobble bubble