Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

capturing group VS non-capturing group

Tags:

regex

I tried to test the performance of capturing and non-capturing group of the regex. By the way, there is very slightly different between the capturing group and the non-capturing group. Is this result normal?

[root@Sensor ~]# ll -h sample.log
-rw-r--r-- 1 root root 21M Oct 20 23:01 sample.log

[root@Sensor ~]# time grep -ciP '(get|post).*' sample.log
20000

real    0m0.083s
user    0m0.070s
sys     0m0.010s

[root@Sensor ~]# time grep -ciP '(?:get|post).*' sample.log
20000

real    0m0.083s
user    0m0.077s
sys     0m0.004s
like image 891
Mr.kang Avatar asked Oct 20 '15 17:10

Mr.kang


People also ask

What does a non-capturing group mean?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence.

What is a capturing group?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

Are non-capturing groups faster?

It should be mentioned that there's no performance difference in searching between capturing and non-capturing groups; neither form is any faster than the other.

What is the purpose of a non-capturing group in regex?

A non-capturing group lets us use the grouping inside a regular expression without changing the numbers assigned to the back references (explained in the next section). This can be very useful in building large and complex regular expressions.


2 Answers

Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match. However, there are three important caveats:

  • The difference is typically very small for simple, short expressions with short matches.
  • The act of starting a program like grep itself takes a significant amount of time and memory, and may overwhelm any small improvement gained by using non-capturing group(s).
  • Some languages implement capturing and non-capturing groups in the same way, causing the latter to give no performance improvement.
like image 150
Pi Marillion Avatar answered Oct 21 '22 17:10

Pi Marillion


If use a lot of the capturing group. The difference seems to be more.

Thanks everyone.:)

[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+" sample.log
20000

real    0m0.057s
user    0m0.051s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+" sample.log
20000

real    0m0.061s
user    0m0.053s
sys     0m0.006s
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+(get|post)" sample.log
1880

real    0m0.839s
user    0m0.833s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+(?:get|post)" sample.log
1880

real    0m0.744s
user    0m0.741s
sys     0m0.003s
like image 44
Mr.kang Avatar answered Oct 21 '22 19:10

Mr.kang