I was writing a regex pattern for a string. The string being a constant type/structure. What I mean is, it looks like(this format is not so important, have a look at the example next)-
[Data Code Format]: POP N/N/N (N: object): JSON object data
Here N represents a number or digit. and what's inside [ ] is a set of string block. But, this format is constant.
So, I wrote a regex-
\s*((?:\S+\s+){1}\S+)\s*(?:\((?:.*?)\))?:\s*(\S*)\s*(\w+)
Keeping this string example in mind-
%DATA-4-JSON: POP 0/1/0 (1: object): JSON object data
It works perfectly, but, what I see on regex101.com is that there is a successful match. But, it has undergone 330 steps to achieve this.
Screenshot-
My question is, its taking 330 steps to achieve this(atleast in my mind I feel its pretty heavy), which I guess can be achieved using if-else
and other comparisons with lesser steps right?
I mean, is regex string parsing so heavy? 330 steps for like 10000's of strings I need to parse is going to be heavy right?
When you are using regexps, they can be costly if you use backtracking. When you use quantifiers with consequent patterns that may match one another (and in case the patterns between them may match empty strings (usually, declared with *
, {0,x}
or ?
quantifiers)), backtracking might play a bad trick on you.
What is bad about your regex?
A slight performance issue is caused by an unnecessary non-capturing group (?:\S+\s+){1}
. It can be written as \S+\s+
and it will already decrease the number of steps (a bit, 302 steps). However, the worst part is that \S
matches the :
delimiter. Thus, the regex engine has to try a lot of possible routes to match the beginning of your expected match. Replace that \S+\s+\S
with [^:\s]+\s+[^:\s]+
, and the step amount will decrease to 159!
Now, coming to (?:\((?:.*?)\))?
- removing the unnecessary inner non-capturing group gives another slight improvement (148 steps), and replacing .*?
with a negated character class gives another boost (139 steps).
I'd use this for now:
\s*([^:\s]+\s+[^:\s]+)\s*(?:\([^()]*\))?:\s*(\S*)\s*(\w+)
If the initial \s*
was obligatory, it would be another enhancement, but you would have different matches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With