Trying to find the links on a page.
my regex is:
/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/
but seems to fail at
<a title="this" href="that">what?</a>
How would I change my regex to deal with href not placed first in the a tag?
Use the querySelector() method to get an element by an href attribute, e.g. document. querySelector('a[href="https://example.com"]') . The method returns the first element that matches the selector or null if no element with the provided selector exists in the DOM.
Use getAttribute() to Get Href in JavaScript The Element interface's getAttribute() method returns the value of a specified attribute for the element.
The href attribute specifies the URL of the page the link goes to. If the href attribute is not present, the <a> tag will not be a hyperlink. Tip: You can use href="#top" or href="#" to link to the top of the current page!
Reliable Regex for HTML are difficult. Here is how to do it with DOM:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
}
The above would find and output the "outerHTML" of all A
elements in the $html
string.
To get all the text values of the node, you do
echo $node->nodeValue;
To check if the href
attribute exists you can do
echo $node->hasAttribute( 'href' );
To get the href
attribute you'd do
echo $node->getAttribute( 'href' );
To change the href
attribute you'd do
$node->setAttribute('href', 'something else');
To remove the href
attribute you'd do
$node->removeAttribute('href');
You can also query for the href
attribute directly with XPath
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/@href');
foreach($nodes as $href) {
echo $href->nodeValue; // echo current attribute value
$href->nodeValue = 'new value'; // set new attribute value
$href->parentNode->removeAttribute('href'); // remove attribute
}
Also see:
On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here
I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :
/^<a.*?href=(["\'])(.*?)\1.*$/
This matches <a
at the begining of the string, followed by any number of any char (non greedy) .*?
then href=
followed by the link surrounded by either "
or '
$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?)\1.*$/', $str, $m);
var_dump($m);
Output:
array(3) {
[0]=>
string(37) "<a title="this" href="that">what?</a>"
[1]=>
string(1) """
[2]=>
string(4) "that"
}
The pattern you want to look for would be the link anchor pattern, like (something):
$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";
why don't you just match
"<a.*?href\s*=\s*['"](.*?)['"]"
<?php
$str = '<a title="this" href="that">what?</a>';
$res = array();
preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);
var_dump($res);
?>
then
$ php test.php
array(2) {
[0]=>
array(1) {
[0]=>
string(27) "<a title="this" href="that""
}
[1]=>
array(1) {
[0]=>
string(4) "that"
}
}
which works. I've just removed the first capture braces.
For the one who still not get the solutions very easy and fast using SimpleXML
$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Its working for me
Quick test: <a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>
seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.
The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.
See live example on: http://www.rubular.com/r/jsKyK2b6do
I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()
If you really need to use a regular expression then check out this tool, it may help: http://regex.larsolavtorvik.com/
Using your regex, I modified it a bit to suit your need.
<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>
I personally suggest you use a HTML Parser
EDIT: Tested
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With