I understand the concept of writing regular expressions using capturing and non-capturing groups.
Ex:
a(b|c)
would match and capture ab and ac
a(?:b|c)
would match ab and ac but capture a
But how is it useful when I make a new custom grok pattern and what it means to use non-capturing groups.
Looking at a few existing grok patterns like the one below for HOUR:
HOUR (?:2[0123]|[01]?[0-9])
Here we can match the hour format using (2[0123]|[01]?[0-9])
as well.
What makes the grok pattern use the non-capturing expression here? Based on what parameters should I decide to use this (?:subex)
The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not.
The (?:2[0123]|[01]?[0-9])
pattern contains a non-capturing group that is only used for grouping subpattern sequences. The (2[0123]|[01]?[0-9])
regex contains a numbered capturing group that matches and captures the value (=stores in some additional buffer with ID equal to the order of the capture group in the pattern). Mind that there are also named capture groups, like (?<field>2[0123]|[01]?[0-9])
that assigns the value captured to a named group.
With named_captures_only
parameter set to false, a(b|c)
regex will match ab
or ac
and assign a b
or c
to a separate field. When you use a non-capturing group a(?:b|c)
, no field will ever get created, this text will only be matched.
Since named_captures_only
parameter default value is True
, the difference between a numbered capturing or non-capturing group is removed in Grok patterns. So, by default only named captures (like a(?<myfield>b|c)
) can be used to create fields.
I think the preference is given to non-capturing groups in common Grok patterns in order not to depend on the named_captures_only
parameter setting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With