I need to do a non greedy match and hope someone can help me. I have the following, and I am using JavaScript and ASP
match(/\href=".*?\/pdf\/.*?\.pdf/)
The above match, matches the first start of an href tag. I need it to only match the last href that is part of the /pdf/
folder.
any ideas ?
backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.
The Non-Greedy search makes it possible to identify the target element from a pool of similar applications, matching the attributes you specify. It needs to be included in the top-level tag of a selector.
Greedy: As Many As Possible (longest match) For instance, take the + quantifier. It allows the engine to match one or more of the token it quantifies: \d+ can therefore match one or more digits. But "one or more" is rather vague: in the string 123, "one or more digits" (starting from the left) could be 1, 12 or 123.
You need to use capturing parenthesis for sub-expression matches:
match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1];
Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1
contains the section matching \/pdf\/.*?\.pdf
.
.*?
if it's matching too broadly. For instance:
match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];
[^"]+?
will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:
<a href="someurl/somepage.html">Test</a><a href="dir/pdf/file.pdf">Some PDF</a>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With