Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need information on Grok patterns that use non capturing group (?: )

I understand the concept of writing regular expressions using capturing and non-capturing groups.

Ex:

a(b|c) would match and capture ab and ac

a(?:b|c) would match ab and ac but capture a

But how is it useful when I make a new custom grok pattern and what it means to use non-capturing groups.

Looking at a few existing grok patterns like the one below for HOUR:

HOUR (?:2[0123]|[01]?[0-9])

Here we can match the hour format using (2[0123]|[01]?[0-9]) as well. What makes the grok pattern use the non-capturing expression here? Based on what parameters should I decide to use this (?:subex)

like image 701
sruthi Avatar asked Jul 08 '16 16:07

sruthi


1 Answers

The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not.

The (?:2[0123]|[01]?[0-9]) pattern contains a non-capturing group that is only used for grouping subpattern sequences. The (2[0123]|[01]?[0-9]) regex contains a numbered capturing group that matches and captures the value (=stores in some additional buffer with ID equal to the order of the capture group in the pattern). Mind that there are also named capture groups, like (?<field>2[0123]|[01]?[0-9]) that assigns the value captured to a named group.

With named_captures_only parameter set to false, a(b|c) regex will match ab or ac and assign a b or c to a separate field. When you use a non-capturing group a(?:b|c), no field will ever get created, this text will only be matched.

Since named_captures_only parameter default value is True, the difference between a numbered capturing or non-capturing group is removed in Grok patterns. So, by default only named captures (like a(?<myfield>b|c)) can be used to create fields.

I think the preference is given to non-capturing groups in common Grok patterns in order not to depend on the named_captures_only parameter setting.

like image 121
Wiktor Stribiżew Avatar answered May 23 '23 10:05

Wiktor Stribiżew