Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for anchor tag with all attributes

I'm trying to get a regular expression to replace all the links out of a text string for the value of the link.

A link may look like these:

<a href="http://whatever" id="an_id" rel="a_rel">the link</a>
<a href="/absolute_url/whatever" id="an_id" rel="a_rel">the link</a>

I want a regular expression that I get: the link

like image 592
Lobo Avatar asked Feb 06 '12 09:02

Lobo


People also ask

How do I use an anchor in regex?

Summary. Use the ^ anchor to match the beginning of the text. Use the $ anchor to match the end of the text. Use the m flag to enable the multiline mode that instructs the ^ and $ anchors to match the beginning and end of the text as well as the beginning and end of the line.

How many attributes are there in anchor tag?

There are five commonly-used Anchor Tag attributes: href.

What is the syntax for an anchor element?

<a>: The Anchor element. The <a> HTML element (or anchor element), with its href attribute, creates a hyperlink to web pages, files, email addresses, locations in the same page, or anything else a URL can address. Content within each <a> should indicate the link's destination.


2 Answers

/<a[\s]+([^>]+)>((?:.(?!\<\/a\>))*.)<\/a>/g

This one will match any <a ...>...</a> tag including correctly matching ones that contain a < or any full tags such as:

blah blah <a href="test.html">This line contains an HTML opening < bracket.</a> blah blah
blah blah <a href="test.html">This line contains <strong>bold</strong> text.</a> blah blah

Would capture:

<a href="test.html">This line contains an HTML opening < bracket.</a>
  • with capture groups:
    • href="test.html"
    • This line contains an HTML opening < bracket.

and

<a href="test.html">This line contains <strong>bold</strong> text.</a>
  • with capture groups:
    • href="test.html"
    • This line contains <strong>bold</strong> text.

It also includes capturing groups for the tag attributes (like class="", href="", etc) and contain (what is between the tag) that can be removed if you do not need them.

If you want to capture across multiple lines add an "s" before or after the "g" flag at the end. Note that the "s" flag may not work in all flavors of regular expression.

Capture example (not using the "s" flag - not supported by regexr yet): http://regexr.com/39rsv

like image 159
Jim Avatar answered Sep 23 '22 18:09

Jim


/<a[^>]*>([^<]+)<\/a>/g

It's far from being perfect, but you need to provide more examples of what is a correct match and what isn't (e.g. what about whitespaces?)

like image 40
F.P Avatar answered Sep 22 '22 18:09

F.P