I've been looking up regular expression tutorials trying to get the hang of them and was enjoying the tutorial in this link right up until this problem: http://regexone.com/lesson/12
I cannot seem to figure out what the difference between "matching" and "capturing" is. Nothing I write seems to select the text under the "Capture" section (not even .*
).
Edit: Here is an example for the tutorial that confuses me: (.* (.*))
is considered correct and (.* .*)
is not. Is this a problem with the tutorial or something I am not understanding?
A Match is an object that indicates a particular regular expression matched (a portion of) the target text. A Group indicates a portion of a match, if the original regular expression contained group markers (basically a pattern in parentheses).
Matches(String, Int32) Searches the specified input string for all occurrences of a regular expression, beginning at the specified starting position in the string. Matches(String) Searches the specified input string for all occurrences of a regular expression.
Avoid coding in regex if you can Don't solve important problems with regex. regex is expensive – regex is often the most CPU-intensive part of a program. And a non-matching regex can be even more expensive to check than a matching one.
Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.
When engine matches a part of string or the whole but does return nothing.
When engine matches a part of string or the whole and does return something.
-- What's the meaning of returning?
When you need to check/store/validate/work/love a part of string that your regex matched it before you need capturing groups (...)
At your example this regex .*?\d+
just matches the dates and years See here
And this regex .*?(\d+)
matches the whole and captures the year See here
And (.*?(\d+))
will match the whole and capture the whole and the year respectively See here
*Please notice the bottom right box titled Match groups
So returning....
preg_match("/.*?\d+/", "Jan 1987", $match);
print_r($match);
Output:
Array
(
[0] => Jan 1987
)
preg_match("/(.*?\d+)/", "Jan 1987", $match);
print_r($match);
Output:
Array
(
[0] => Jan 1987
[1] => Jan 1987
)
preg_match("/(.*?(\d+))/", "Jan 1987", $match);
print_r($match);
Output:
Array
(
[0] => Jan 1987
[1] => Jan 1987
[2] => 1987
)
So as you can see at the last example, we have 2 capturing groups indexed at 1 and 2 in the array, and 0 is always the matched string however it's not captured.
capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.
for example, the answer to the tutorial you linked to would be (\w{3}\s+(\d+))
.
now, why ?
to simply match the date strings it would be enough to write \w{3}\s+\d+
(3 word characters, followed by one or more spaces, followed by one or more digits), but adding capture groups to the expression (a capture group is simply anything enclosed in parenthesis ()
) will allow me to later extract either the whole expression (using "$1", because the outer-most pair of parenthesis are the 1st the parser encounters) or just the year (using "$2", because the 2nd pair of parenthesis, around the \d+
, are the 2nd pair that the regexp parser encounters)
capture groups come in handy when you're interested not only in matching strings to pattern, but also extracting data from the matched strings or modifying them in any way. for example, suppose you wanted to add 5 years to each of those dates in the tutorial - being able to extract just the year part from a matched string (using $2
) would come in handy then
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With