I want to have an expression, where the next symbol after found value is not "(".
I have the following base regex:
(([_A-Za-z]([_\w])+)|([A-Za-z]))
and text for example:
a3+red+42+_dv+Sy(w12+44)
Wished regex should return:
a3, red, _dv, w12
this base regex returns
a3, red, _dv, Sy, w12
but I need exclude 'Sy', because next symbol is "(".
I try the following:
(([_A-Za-z]([_\w])+)|([A-Za-z]))(\b)
but it returns
a3+, red+, _dv+, w12)
I not need to have next symbol, I need include only if the next symbol is not "(".
You need to do three things:
enclose the pattern in an atomic group (or at least the first part of your alternation that contains a quantifier)
start your pattern with a word boundary (to quickly avoid useless positions)
use a lookahead assertion to test the next character if any
result:
\b((?>[_A-Za-z]\w+)|[A-Za-z]\b)(?!\()
the point 1 is important to block the backtracking mechanism in this kind of situation: Abcd(
Without it the pattern will succeed and return Abc
. Using an atomic group, the pattern will match Abcd
and since it can't to go back, it will fail with the next character.
Other way to write the pattern:
\b(?>[A-Za-z]\w*|_\w+)(?!\()
If you don't expect a single _
in your input how about this regex:
\b[^\W\d]\w*+(?!\()
\b
matches a word boundary
\w
matches a word character
[^\W\d]
matches [_a-zA-Z]
(?!\()
if not followed by (
See demo at regex101
The +
after *
quantifier makes it possessive to prevent backtracking at the lookahead.
Instead you could use another word boundary \b
(see another demo at regex101).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With