I have a string, for example:
$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.
How can I do this efficiently?
I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.
I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.
function getUrlsFromString($string) {
$regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
Example:
$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];
With YouTube specific regex:
function getYoutubeUrlsFromString($string) {
$regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
Example:
$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];
you can parse the html with DOMDocument and look for youtube url's with stripos, something like this
$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$DOMD = @DOMDocument::loadHTML($html);
foreach($DOMD->getElementsByTagName("a") as $url)
{
if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
{
$first_found_youtube_url = $url->getAttribute("href");
break;
}
}
personally, i would probably use
"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)
though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With