Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract the video ID from youtube url in .net

Tags:

c#

.net

regex

I am struggling with a regex to extract the video ID from a youtube url.

"(?:.+?)?(?:\\/v\\/|watch\\/|\\?v=|\\&v=|youtu\\.be\\/|\\/v=|^youtu\\.be\\/)([a-zA-Z0-9_-]{11})+";

It's working since it matches the video ID but I want to restrict it at the youtube domain, i don't want it to match the id if the domain differs from youtube.com or youtu.be. Unfortunately I cannot understand this regex to apply the restriction.

I want to match the id only when the domain is :

  • www.youtube.com
  • youtube.com
  • youtu.be
  • www.youtu.be

with http or https at the front (or without)

The above mentioned regex is successfully matching the youtube id of the following examples:

"http://youtu.be/AAAAAAAAA01"
"http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02"
"http://www.youtube.com/embed/watch?v=AAAAAAAAA03"
"http://www.youtube.com/embed/v=AAAAAAAAA04"
"http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05"
"http://www.youtube.com/watch?v=AAAAAAAAA06"
"http://www.youtube.com/v/AAAAAAAAA07"
"www.youtu.be/AAAAAAAAA08"
"youtu.be/AAAAAAAAA09"
"http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related"
"http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17"
"http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0"
"http://www.youtube.com/watch/AAAAAAAAA11"

The current code that checks the url right now is:

private const string YoutubeLinkRegex = "(?:.+?)?(?:\\/v\\/|watch\\/|\\?v=|\\&v=|youtu\\.be\\/|\\/v=|^youtu\\.be\\/)([a-zA-Z0-9_-]{11})+";
    private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled);


    public string ExtractVideoIdFromUrl(string url)
    {
        //extract the id
        var regRes = regexExtractId.Match(url);
        if (regRes.Success)
        {
            return regRes.Groups[1].Value;
        }
        return null;
    }
like image 967
Menelaos Vergis Avatar asked Sep 29 '16 18:09

Menelaos Vergis


People also ask

How do I get the YouTube video ID from a URL?

The video ID will be located in the URL of the video page, right after the v= URL parameter. In this case, the URL of the video is: https://www.youtube.com/watch?v=aqz-KE-bpKQ. Therefore, the ID of the video is aqz-KE-bpKQ .

How are YouTube video IDS generated?

Every YouTube video has a unique ID based on a counting system called Base 64. Randomly generated, that Base 64 ID allows YouTube to have a unique yet short url by using the alphabet in lowercase and uppercase plus two symbols: – and _.


2 Answers

It is not required to use regular expressions here

var url = @"https://www.youtube.com/watch?v=6QlW4m9xVZY";
var uri = new Uri(url);

// you can check host here => uri.Host <= "www.youtube.com"

var query = HttpUtility.ParseQueryString(uri.Query);
var videoId = query["v"];

// videoId = 6QlW4m9xVZY

Ok, example above is working, when you have v=videoId as parameter. If you have videoId as segment, you can use this:

var url = "http://youtu.be/AAAAAAAAA09";
var uri = new Uri(url);

var videoid = uri.Segments.Last(); // AAAAAAAAA09

Combining all together, we can get

var url = @"https://www.youtube.com/watch?v=Lvcyj1GfpGY&list=PLolZLFndMkSIYef2O64OLgT-njaPYDXqy";
var uri = new Uri(url);

// you can check host here => uri.Host <= "www.youtube.com"

var query = HttpUtility.ParseQueryString(uri.Query);

var videoId = string.Empty;

if (query.AllKeys.Contains("v"))
{
    videoId = query["v"];
}
else
{
    videoId = uri.Segments.Last();
}

Ofcourse, I dont know anything about you requirements, but, hope it helps.

like image 130
tym32167 Avatar answered Oct 12 '22 01:10

tym32167


tym32167's answer throws an exception at var uri = new Uri(url); when url doesn't have a scheme, like "www.youtu.be/AAAAAAAAA08".

Besides, wrong videoIds are returned for some urls.

  • "http://www.youtube.com/embed/v=AAAAAAAAA04" -> "v=AAAAAAAAA04"
  • "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA" -> "attribution_link"
  • "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail" -> "attribution_link"
  • "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17" -> "attribution_link"

So here's my code based on tym32167's one.

    static private string GetYouTubeVideoIdFromUrl(string url)
    {
        Uri uri = null;
        if (!Uri.TryCreate(url, UriKind.Absolute, out uri))
        {
            try
            {
                uri = new UriBuilder("http", url).Uri;
            }
            catch
            {
                // invalid url
                return "";
            }
        }

        string host = uri.Host;
        string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" };
        if (!youTubeHosts.Contains(host))
            return "";

        var query = HttpUtility.ParseQueryString(uri.Query);

        if (query.AllKeys.Contains("v"))
        {
            return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value;
        }
        else if (query.AllKeys.Contains("u"))
        {
            // some urls have something like "u=/watch?v=AAAAAAAAA16"
            return Regex.Match(query["u"], @"/watch\?v=([a-zA-Z0-9_-]{11})").Groups[1].Value;
        }
        else
        {
            // remove a trailing forward space
            var last = uri.Segments.Last().Replace("/", "");
            if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$"))
                return last.Replace("v=", "");

            string[] segments = uri.Segments;
            if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/")
                return "";

            return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value;
        }
    }

Let's test it.

        string[] urls = {"http://youtu.be/AAAAAAAAA01",
            "http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02",
            "http://www.youtube.com/embed/watch?v=AAAAAAAAA03",
            "http://www.youtube.com/embed/v=AAAAAAAAA04",
            "http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05",
            "http://www.youtube.com/watch?v=AAAAAAAAA06",
            "http://www.youtube.com/v/AAAAAAAAA07",
            "www.youtu.be/AAAAAAAAA08",
            "youtu.be/AAAAAAAAA09",
            "http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related",
            "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA",
            "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail",
            "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17",
            "http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0",
            "http://www.youtube.com/watch/AAAAAAAAA11",};

        Console.WriteLine("***Youtube urls***");
        foreach (string url in urls)
        {
            Console.WriteLine("{0}\n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
        }

        string[] invalidUrls = {
            "ww.youtube.com/v/AAAAAAAAA13",
            "http:/www.youtube.com/v/AAAAAAAAA13",
            "http://www.youtub1e.com/v/AAAAAAAAA13",
            "http://www.vimeo.com/v/AAAAAAAAA13",
            "www.youtube.com/b/AAAAAAAAA13",
            "www.youtube.com/v/AAAAAAAAA1",
            "www.youtube.com/v/AAAAAAAAA1&",
            "www.youtube.com/v/AAAAAAAAA1/",
            ".youtube.com/v/AAAAAAAAA13"};

        Console.WriteLine("***Invalid youtube urls***");
        foreach (string url in invalidUrls)
        {
            Console.WriteLine("{0}\n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
        }

Result (everything's alright)

***Youtube urls***
http://youtu.be/AAAAAAAAA01
-> AAAAAAAAA01
http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02
-> AAAAAAAAA02
http://www.youtube.com/embed/watch?v=AAAAAAAAA03
-> AAAAAAAAA03
http://www.youtube.com/embed/v=AAAAAAAAA04
-> AAAAAAAAA04
http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05
-> AAAAAAAAA05
http://www.youtube.com/watch?v=AAAAAAAAA06
-> AAAAAAAAA06
http://www.youtube.com/v/AAAAAAAAA07
-> AAAAAAAAA07
www.youtu.be/AAAAAAAAA08
-> AAAAAAAAA08
youtu.be/AAAAAAAAA09
-> AAAAAAAAA09
http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related
-> i-AAAAAAA14
http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
-> AAAAAAAAA15
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail
-> AAAAAAAAA16
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17
-> AAAAAAAAA17
http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0
-> A-AAAAAAA18
http://www.youtube.com/watch/AAAAAAAAA11
-> AAAAAAAAA11



***Invalid youtube urls***
ww.youtube.com/v/AAAAAAAAA13
-> 
http:/www.youtube.com/v/AAAAAAAAA13
-> 
http://www.youtub1e.com/v/AAAAAAAAA13
-> 
http://www.vimeo.com/v/AAAAAAAAA13
-> 
www.youtube.com/b/AAAAAAAAA13
-> 
www.youtube.com/v/AAAAAAAAA1
-> 
www.youtube.com/v/AAAAAAAAA1&
-> 
www.youtube.com/v/AAAAAAAAA1/
-> 
.youtube.com/v/AAAAAAAAA13
-> 
like image 28
dixhom Avatar answered Oct 12 '22 01:10

dixhom