Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression "Matching" vs "Capturing"

Tags:

regex

I've been looking up regular expression tutorials trying to get the hang of them and was enjoying the tutorial in this link right up until this problem: http://regexone.com/lesson/12

I cannot seem to figure out what the difference between "matching" and "capturing" is. Nothing I write seems to select the text under the "Capture" section (not even .*).

Edit: Here is an example for the tutorial that confuses me: (.* (.*)) is considered correct and (.* .*) is not. Is this a problem with the tutorial or something I am not understanding?

like image 673
asimes Avatar asked Jan 18 '14 05:01

asimes


People also ask

What is the difference between a match and group in regex?

A Match is an object that indicates a particular regular expression matched (a portion of) the target text. A Group indicates a portion of a match, if the original regular expression contained group markers (basically a pattern in parentheses).

What does matching mean in regex?

Matches(String, Int32) Searches the specified input string for all occurrences of a regular expression, beginning at the specified starting position in the string. Matches(String) Searches the specified input string for all occurrences of a regular expression.

Is regex matching expensive?

Avoid coding in regex if you can Don't solve important problems with regex. regex is expensive – regex is often the most CPU-intensive part of a program. And a non-matching regex can be even more expensive to check than a matching one.

Is regex matching fast?

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.


2 Answers

Matching:

When engine matches a part of string or the whole but does return nothing.

Capturing:

When engine matches a part of string or the whole and does return something.

-- What's the meaning of returning?

When you need to check/store/validate/work/love a part of string that your regex matched it before you need capturing groups (...)

At your example this regex .*?\d+ just matches the dates and years See here

And this regex .*?(\d+) matches the whole and captures the year See here

And (.*?(\d+)) will match the whole and capture the whole and the year respectively See here

*Please notice the bottom right box titled Match groups

So returning....

1:

preg_match("/.*?\d+/", "Jan 1987", $match);
print_r($match);

Output:

Array
(
    [0] => Jan 1987
)

2:

preg_match("/(.*?\d+)/", "Jan 1987", $match);
print_r($match);

Output:

Array
(
    [0] => Jan 1987
    [1] => Jan 1987
)

3:

preg_match("/(.*?(\d+))/", "Jan 1987", $match);
print_r($match);

Output:

Array
(
    [0] => Jan 1987
    [1] => Jan 1987
    [2] => 1987
)

So as you can see at the last example, we have 2 capturing groups indexed at 1 and 2 in the array, and 0 is always the matched string however it's not captured.

like image 54
revo Avatar answered Oct 23 '22 04:10

revo


capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.

for example, the answer to the tutorial you linked to would be (\w{3}\s+(\d+)).

now, why ?

to simply match the date strings it would be enough to write \w{3}\s+\d+ (3 word characters, followed by one or more spaces, followed by one or more digits), but adding capture groups to the expression (a capture group is simply anything enclosed in parenthesis ()) will allow me to later extract either the whole expression (using "$1", because the outer-most pair of parenthesis are the 1st the parser encounters) or just the year (using "$2", because the 2nd pair of parenthesis, around the \d+, are the 2nd pair that the regexp parser encounters)

capture groups come in handy when you're interested not only in matching strings to pattern, but also extracting data from the matched strings or modifying them in any way. for example, suppose you wanted to add 5 years to each of those dates in the tutorial - being able to extract just the year part from a matched string (using $2) would come in handy then

like image 42
radai Avatar answered Oct 23 '22 03:10

radai