I tried to test the performance of capturing and non-capturing group of the regex. By the way, there is very slightly different between the capturing group and the non-capturing group. Is this result normal?
[root@Sensor ~]# ll -h sample.log
-rw-r--r-- 1 root root 21M Oct 20 23:01 sample.log
[root@Sensor ~]# time grep -ciP '(get|post).*' sample.log
20000
real 0m0.083s
user 0m0.070s
sys 0m0.010s
[root@Sensor ~]# time grep -ciP '(?:get|post).*' sample.log
20000
real 0m0.083s
user 0m0.077s
sys 0m0.004s
Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
It should be mentioned that there's no performance difference in searching between capturing and non-capturing groups; neither form is any faster than the other.
A non-capturing group lets us use the grouping inside a regular expression without changing the numbers assigned to the back references (explained in the next section). This can be very useful in building large and complex regular expressions.
Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match. However, there are three important caveats:
grep
itself takes a significant amount of time and memory, and may overwhelm any small improvement gained by using non-capturing group(s).If use a lot of the capturing group. The difference seems to be more.
Thanks everyone.:)
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+" sample.log
20000
real 0m0.057s
user 0m0.051s
sys 0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+" sample.log
20000
real 0m0.061s
user 0m0.053s
sys 0m0.006s
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+(get|post)" sample.log
1880
real 0m0.839s
user 0m0.833s
sys 0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+(?:get|post)" sample.log
1880
real 0m0.744s
user 0m0.741s
sys 0m0.003s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With