Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to extract both video id or playlist id from youtube url

I would like to know how to extract youtube video id or playlist id depending upon the url using a single regex expression. The regex should also ensure that the domain is youtube.com Here are some of the results I need:

Extract Playlist ID For

    https://www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    http://www.youtube.com/playlist?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
    https://www.youtube.com/embed/videoseries?list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r  

Extract Video ID For

https://www.youtube.com/watch?v=fqMfRi2gJok&index=1&list=PLuC2HflhhpLGQ4RgqA76_Gv52fGA0909r
https://www.youtube.com/watch?v=fqMfRi2gJok
http://youtu.be/cCnrX1w5luM 
http://youtube.com/embed/cCnrX1w5luM
http://youtube.com/v/cCnrX1w5luM
https://www.youtube.com/v/cCnrX1w5luM
www.youtube.com/v/cCnrX1w5luM
youtube.com/v/cCnrX1w5luM

These are just example urls only. I need to extract respective ID's for all possible youtube link structures.

In short extract video id and if it is absent obtain playlist id.

like image 533
jollykoshy Avatar asked Dec 09 '22 01:12

jollykoshy


2 Answers

Your problem is explicitly has two patterns

The first:

^.*?(?:v|list)=(.*?)(?:&|$)

For any urls which have explicit attribute, or you can say they have = symbol in url.

Explanation

^.*?(?:v|list)=: Any string till word v= or list= which here we prefer v over list,

(.*?)(?:&|$): Any string which ended by & symbol or ending line symbol $ which here we prefer & over $.

The second:

^(?:(?!=).)*\/(.*)$

For any url which don't have attribute or there is no = symbol in url.

Explanation

^(?:(?!=).)*\/: Any string which don't have = symbol (here handle by the negative lookahead (?!=)) till / symbol,

(.*)$: Any string till the end of line.

Combine them into one regex we get

^(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?.*?(?:v|list)=(.*?)(?:&|$)|^(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?(?:(?!=).)*\/(.*)$

here,

(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)? is added to handle various form of www.youtube.com's url

and this should help you get what you want

see: DEMO

IMPORTANT NOTE: This question, questioner want to extract id from www.youtube.com which he prefer "video id" over "playlist id".

like image 89
fronthem Avatar answered Jan 05 '23 00:01

fronthem


https://regex101.com/r/mI3qY9/4

This regex assumes you are giving it a legitimate Youtube link. This grabs all the v and lists together:

/(?:(?:\?|&)(?:v|list)=|embed\/|v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

Breakdown:

/
(?:                         //non-capturing group
  (?:\?|&)(?:v|list)=       //? or & following a v or list
  |                         //or
  embed\/                   //embed/
  |                         //or
  v\/                       //v/            
  |                         //or
  youtu\.be\/               //youtu.be/
)
(
  (?!videoseries)           //will not capture "videoseries"
  [a-zA-Z0-9_]*             //capture any alphabet digits or underscore that follows afterwards
)          
/g                          //global

But you may not be able to tell which is v and which is list, so,

This only grabs the v:

/(?:(?:\?|&)v=|embed\/|v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

This only grabs the list:

/(?:(?:\?|&)list=)((?!videoseries)[a-zA-Z0-9_]*)/g

This only grabs YouTube vs:

/(?:youtube\.com.*(?:\?|&)(?:v)=|youtube\.com.*embed\/|youtube\.com.*v\/|youtu\.be\/)((?!videoseries)[a-zA-Z0-9_]*)/g

Only YouTube lists:

/(?:youtube\.com.*(?:\?|&)(?:list)=)((?!videoseries)[a-zA-Z0-9_]*)/g

This is basically the same but adding youtube\.com.* too to the regex. It won't grab e.g. http://example.com/v/abc

https://regex101.com/r/mI3qY9/5

Explanation:

youtube\.com.*          //Matches youtube.com and any multiple characters followed
like image 34
Daniel Cheung Avatar answered Jan 04 '23 23:01

Daniel Cheung