Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grab URL within a string which contains HTML code

Tags:

php

I have a string, for example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.

How can I do this efficiently?

I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.

like image 293
Henrik Petterson Avatar asked Apr 27 '26 07:04

Henrik Petterson


2 Answers

I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.

function getUrlsFromString($string) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];

With YouTube specific regex:

function getYoutubeUrlsFromString($string) {
    $regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];
like image 157
skrilled Avatar answered Apr 29 '26 21:04

skrilled


you can parse the html with DOMDocument and look for youtube url's with stripos, something like this

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$DOMD = @DOMDocument::loadHTML($html);

foreach($DOMD->getElementsByTagName("a") as $url)
{
    if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
    {
        $first_found_youtube_url = $url->getAttribute("href");
        break;
    }
}

personally, i would probably use

"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)

though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..

like image 43
hanshenrik Avatar answered Apr 29 '26 21:04

hanshenrik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!