I have this regex to match with image URLs in HTML code:
$regex = '#[\w,=/:.-]+\.(?:jpe?g|png|gif)#iu';
Regex demo
Php demo:
$input = <<<HTML
<a href="https://e...content-available-to-author-only...e.com/example1.jpg">
<a href="https://e...content-available-to-author-only...e.com/ストスト.jpg">
<a href="https://e...content-available-to-author-only...e.com/example3.jpg">
<a href="https://e...content-available-to-author-only...e.com/example3.bak">
HTML;
$dom = new DomDocument();
$dom->loadHTML(mb_convert_encoding($input, 'HTML-ENTITIES', "UTF-8"));
$anchors = $dom->getElementsByTagName("a");
$regex = '#^[\w,=/:.-]+\.(?:jpe?g|png|gif)$#iu';
foreach ($anchors as $anchor) {
$res = $anchor->getAttribute("href");
if (preg_match($regex, $res)) {
echo "Valid url: $res" . PHP_EOL;
} else {
echo "Invalid url: $res" . PHP_EOL;
}
}
My question is, how can I make it only match if it starts with http or //. Currently it matches with example.jpg which isn't a full URL.
I'd suggest such pattern: href="((?:http|\/\/)[^"]+\.(?:jpe?g|png|gif))"
Explanation:
href=" - match href=" literally, it will assure that you'll match hyperlink
(...) - capturing group to store actual link
(?:...) - non-capturing group
http|\/\/ - match http or //
[^"]+ - match 1+ of any characters other from "
\. - match . literally
jpe?g|png|gif - alterantion, match onne of the options jpeg, jpg (due to e?), png, gif
" - match " literally
Demo
Matched link will be inside 1st capturing group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With