I need to extract a hyperlink, containing a specific word in the url, from a piece of text. Example;
"This is a text with a link to some page. Click this link <a href="/server/specificword.htm>this is a link to a page</a>
to see that page. Here is a link that doesn't have the word "specificword" in it: <a href="/server/mypage.htm>this is a link without the word "specificword" in the url</a>
"
So, I need to parse this text, check the hyperlinks to see if one of them contains the word "specificword", and then extract the entire hyperlink. I would then end up with this:
<a href="/server/specificword.htm>this is a link to a page</a>
I need the hyperlink that has specificword in the url eg. /server/specificword.htm, not in the link text
One regex I have tried, is this one: /(<a[^>]*>.*?</a>)|specificword/
This will match all hyperlinks in the text, or "specificword". If the text has multiple links, without the word "specificword", I will get those too.
Also, I have tried this one, but it matces nothing:
<a.*?href\s*=\s*["\']([^"\'>]*specificword[^"\'>]*)["\'][^>]*>.*?<\/a>
My regex skills end here, any help would be great....
try this for all the a tag:
/<a [^>]*\bhref\s*=\s*"[^"]*SPECIFICWORD.*?<\/a>/
or just for the link (in the first capture group):
/<a [^>]*\bhref\s*=\s*"([^"]*SPECIFICWORD[^"]*)/
If you use php, for the link:
preg_match_all('/<a [^>]*\bhref\s*=\s*"\K[^"]*SPECIFICWORD[^"]*/', $text, $results);
This one should suit your needs:
<a href="[^"]*?specificword.*?">.*?</a>
Demo
If you want to allow other attributes on your anchor tar, and be more premissive about inner spaces, you could try:
<a( [^>]*?)? href="[^"]*?specificword.*?"( .*?)?>.*?</a>
Demo
You could also of course use non-capturing groups (?:...)
:
<a(?: [^>]*?)? href="[^"]*?specificword.*?"(?: .*?)?>.*?</a>
Demo
And finally, if you want to allow simple quotes for your href
attribute:
<a(?: [^>]*?)? href=(["'])[^\1]*?specificword.*?\1(?: .*?)?>.*?</a>
Demo
Last but not least: if you want to capture the URL, just put parentheses around the [^\1]*?specificword.*?
part:
<a(?: [^>]*?)? href=(["'])([^\1]*?specificword.*?)\1(?: .*?)?>.*?</a>
Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With