I'm trying to find a way to make a list of everything between <code><a></code> and <code></a></code> tags. So I have a list of links and I want to get the names of the links (not where the links go, but what they're called on the page). Would be really helpful to me. Currently I have this: <pre class="prettyprint"><code>$lines = preg_split("/\r?\n|\r/", $content); // content is the given page foreach ($lines as $val) { if (preg_match("/(<A(.*)>)(<\/A>)/", $val, $alink)) { $newurl = $alink[1]; // put in array of found links $links[$index] = $newurl; $index++; $is_href = true; } } </code></pre>

I'm a big fan of regexes, but this is not the right place to use them. Use a real HTML parser. <ul> <li>Your code will be clearer</li> <li>It will be more likely to work</li> </ul> I Googled for a PHP HTML parser, and found this one. If you know you're working with XHTML, then you could use PHP's standard XML parser.

<pre class="prettyprint"><code><a\s*(.*)\>(.*)</a> <a href="http://www.stackoverflow.com">Go to stackoverflow.com</a> </code></pre> $1 = href="www.stackoverflow.com" $2 = Go to stackoverflow.com I answered a similar question to strip everything except a tags here

If I am going to complain about all of the regex solutions, I suppose I need to actually demonstrate how to use a proper HTML parser (the OP makes no indication that the HTML to be parsed is in any way invalid -- so a legitimate parser is absolutely appropriate for script stability and quality). Now, my advice does require that you become familiar with the basics of DOMDocument (and optionally DOMXPath), but you will see that the syntax is far less cryptic than a regex expression once you understand the components involved. For this reason, I will also argue that this technique will improve the overall readability of your script (for you and future readers of your code). Code: (Demos) <pre class="prettyprint"><code>$html = <<<HTML <a href="#">hello</a> <abbr href="#">FYI</abbr> <a title="goodbye">later</a> <a href=https://example.com>no quoted attributes</a> <A href="https://example.com" title="some title" data-key="{\'key\':\'adf0a8dfq<>*1$4%\'">a link with data attribute</A> and this is <a title="hello">not a hyperlink</a> but simply an anchor tag HTML; $dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $linkText = []; foreach ($xpath->evaluate("//a[@href]") as $node) { $linkText[] = $node->nodeValue; } var_export($linkText); </code></pre> Output: <pre class="prettyprint"><code>array ( 0 => 'hello', 1 => 'no quoted attributes', 2 => 'a link with data attribute', ) </code></pre> if you don't care about the <code>href</code> attribute existing: Code: <pre class="prettyprint"><code>$doc = new DOMDocument(); $doc->loadHTML($html); $aTags = []; foreach ($doc->getElementsByTagName('a') as $a) { $aTags[] = $a->nodeValue; } var_export($aTags); </code></pre> Output: <pre class="prettyprint"><code>array ( 0 => 'hello', 1 => 'later', 2 => 'no quoted attributes', 3 => 'a link with data attribute', 4 => 'not a hyperlink', ) </code></pre>

regexp for finding everything between <a> and </a> tags

Tags:

regex

php

I'm trying to find a way to make a list of everything between <a> and </a> tags. So I have a list of links and I want to get the names of the links (not where the links go, but what they're called on the page). Would be really helpful to me.

Currently I have this:

$lines = preg_split("/\r?\n|\r/", $content);  // content is the given page
foreach ($lines as $val) {
  if (preg_match("/(<A(.*)>)(<\/A>)/", $val, $alink)) {     
    $newurl = $alink[1];

    // put in array of found links
    $links[$index] = $newurl;
    $index++;
    $is_href = true;
  }
}

918

asked Dec 05 '08 07:12

Vikram Haer

4 Answers

The standard disclaimer applies: Parsing HTML with regular expressions is not ideal. Success depends on the well-formedness of the input on a character-by-character level. If you cannot guarantee this, the regex will fail to do the Right Thing at some point.

Having said that:

<a\b[^>]*>(.*?)</a>   // match group one will contain the link text

107

answered Oct 05 '22 17:10

Tomalak

I'm a big fan of regexes, but this is not the right place to use them.

Use a real HTML parser.

Your code will be clearer
It will be more likely to work

I Googled for a PHP HTML parser, and found this one.

If you know you're working with XHTML, then you could use PHP's standard XML parser.

answered Oct 05 '22 17:10

slim

<a\s*(.*)\>(.*)</a>

<a href="http://www.stackoverflow.com">Go to stackoverflow.com</a>

$1 = href="www.stackoverflow.com"

$2 = Go to stackoverflow.com

I answered a similar question to strip everything except a tags here

answered Oct 05 '22 18:10

Xetius

If I am going to complain about all of the regex solutions, I suppose I need to actually demonstrate how to use a proper HTML parser (the OP makes no indication that the HTML to be parsed is in any way invalid -- so a legitimate parser is absolutely appropriate for script stability and quality).

Now, my advice does require that you become familiar with the basics of DOMDocument (and optionally DOMXPath), but you will see that the syntax is far less cryptic than a regex expression once you understand the components involved. For this reason, I will also argue that this technique will improve the overall readability of your script (for you and future readers of your code).

Code: (Demos)

$html = <<<HTML
<a href="#">hello</a> <abbr href="#">FYI</abbr> <a title="goodbye">later</a>
<a href=https://example.com>no quoted attributes</a>
<A href="https://example.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\'">a link with data attribute</A>
and
this is <a title="hello">not a hyperlink</a> but simply an anchor tag
HTML;

$dom = new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$linkText = [];
foreach ($xpath->evaluate("//a[@href]") as $node) {
    $linkText[] = $node->nodeValue;
}
var_export($linkText);

Output:

array (
  0 => 'hello',
  1 => 'no quoted attributes',
  2 => 'a link with data attribute',
)

if you don't care about the href attribute existing:

Code:

$doc = new DOMDocument();
$doc->loadHTML($html);
$aTags = [];
foreach ($doc->getElementsByTagName('a') as $a) {
    $aTags[] = $a->nodeValue;
}
var_export($aTags);

Output:

array (
  0 => 'hello',
  1 => 'later',
  2 => 'no quoted attributes',
  3 => 'a link with data attribute',
  4 => 'not a hyperlink',
)

answered Oct 05 '22 17:10

mickmackusa

Related questions
                            
                                Laravel Check Empty Array
                            
                                Replacing the Translator service in Symfony 3
                            
                                How to install Python Package for global use by all users (incl. www-data)
                            
                                Use Laravel touches without global scopes
                            
                                How to access all routes from Slim 3 php framework?
                            
                                Display image in PHP (Laravel)
                            
                                WordPress: How do I customize "Lost your password" text on login page?
                            
                                check if a PHP session known by the sessionid is active
                            
                                laravel 5.4 modify data before validation in request [closed]
                            
                                Error: Call to undefined method DateFormatterTest::getMock()
                            
                                Multiple functions using array_map [duplicate]
                            
                                Laravel: Declaration of App\Providers\EventServiceProvider::boot
                            
                                AppKernel.php weird behavior
                            
                                Google Maps PHP CURL issue "Your client has issued a malformed or illegal request. That’s all we know."
                            
                                Laravel group by, with relationship in different tables [duplicate]
                            
                                Force Laravel to use HTTPS version
                            
                                How to add a class to a WooCommerce product within the product loop
                            
                                How laravel use $this context in static methods?
                            
                                How to make a generic repository in Symfony 4
                            
                                Login a user only if his status is active in Laravel 5.7

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

regexp for finding everything between <a> and </a> tags

Tags:

regex

php

Vikram Haer

People also ask

4 Answers

Tomalak

slim

Xetius

mickmackusa

Recent Activity

Donate For Us