Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non greedy regex match, JavaScript and ASP

I need to do a non greedy match and hope someone can help me. I have the following, and I am using JavaScript and ASP

match(/\href=".*?\/pdf\/.*?\.pdf/)

The above match, matches the first start of an href tag. I need it to only match the last href that is part of the /pdf/ folder.

any ideas ?

like image 788
Gerald Ferreira Avatar asked Mar 11 '10 12:03

Gerald Ferreira


People also ask

How do I make regex not greedy?

backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.

What makes a search non-greedy?

The Non-Greedy search makes it possible to identify the target element from a pool of similar applications, matching the attributes you specify. It needs to be included in the top-level tag of a selector.

What is greedy regex?

Greedy: As Many As Possible (longest match) For instance, take the + quantifier. It allows the engine to match one or more of the token it quantifies: \d+ can therefore match one or more digits. But "one or more" is rather vague: in the string 123, "one or more digits" (starting from the left) could be 1, 12 or 123.


1 Answers

You need to use capturing parenthesis for sub-expression matches:

match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1]; 

Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1 contains the section matching \/pdf\/.*?\.pdf.


Try and make your regex more specific than just .*? if it's matching too broadly. For instance:
match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];

[^"]+? will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:

<a href="someurl/somepage.html">Test</a><a href="dir/pdf/file.pdf">Some PDF</a>
like image 66
Andy E Avatar answered Oct 02 '22 15:10

Andy E