I have a textfield where users can write anything.
For example:
Lorem Ipsum is simply dummy text. http://www.youtube.com/watch?v=DUQi_R4SgWo of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Now I would like to parse it and find all YouTube video URLs and their ids.
Any idea how that works?
YouTube ID is a string of 11 characters, which consists of both upper and lower case alphabets and numeric values. It is used to define a YouTube video uniquely.
How to get a YouTube video ID from a youtube.com page URL. You may be watching the video or just happened to visit a link to a video. The video ID will be located in the URL of the video page, right after the v= URL parameter.
A YouTube video URL may be encountered in a variety of formats:
http://youtu.be/NLqAF9hrVbY
http://www.youtube.com/embed/NLqAF9hrVbY
https://www.youtube.com/embed/NLqAF9hrVbY
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
http://www.youtube.com/watch?v=NLqAF9hrVbY
http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
http://www.youtube-nocookie.com
Here is a PHP function with a commented regex that matches each of these URL forms and converts them to links (if they are not links already):
// Linkify youtube URLs which are not already links. function linkifyYouTubeURLs($text) { $text = preg_replace('~(?#!js YouTubeId Rev:20160125_1800) # Match non-linked youtube URL in the wild. (Rev:20130823) https?:// # Required scheme. Either http or https. (?:[0-9A-Z-]+\.)? # Optional subdomain. (?: # Group host alternatives. youtu\.be/ # Either youtu.be, | youtube # or youtube.com or (?:-nocookie)? # youtube-nocookie.com \.com # followed by \S*? # Allow anything up to VIDEO_ID, [^\w\s-] # but char before ID is non-ID char. ) # End host alternatives. ([\w-]{11}) # $1: VIDEO_ID is exactly 11 chars. (?=[^\w-]|$) # Assert next char is non-ID or EOS. (?! # Assert URL is not pre-linked. [?=&+%\w.-]* # Allow URL (query) remainder. (?: # Group pre-linked alternatives. [\'"][^<>]*> # Either inside a start tag, | </a> # or inside <a> element text contents. ) # End recognized pre-linked alts. ) # End negative lookahead assertion. [?=&+%\w.-]* # Consume any URL (query) remainder. ~ix', '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>', $text); return $text; }
; // End $YouTubeId.
And here is a JavaScript version with the exact same regex (with comments removed):
// Linkify youtube URLs which are not already links. function linkifyYouTubeURLs(text) { var re = /https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig; return text.replace(re, '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>'); }
Notes:
$1
."http://www.youtube.com/watch?v=VIDEO_ID"
style URL and sets the link text to: "YouTube link: VIDEO_ID"
.Edit 2011-07-05: Added -
hyphen to ID char class
Edit 2011-07-17: Fixed regex to consume any remaining part (e.g. query) of URL following YouTube ID. Added 'i'
ignore-case modifier. Renamed function to camelCase. Improved pre-linked lookahead test.
Edit 2011-07-27: Added new "user" and "ytscreeningroom" formats of YouTube URLs.
Edit 2011-08-02: Simplified/generalized to handle new "any/thing/goes" YouTube URLs.
Edit 2011-08-25: Several modifications:
linkifyYouTubeURLs()
function.\b
word boundary anchor around the VIDEO_ID. However, this will not work if the VIDEO_ID begins or ends with a -
dash. Fixed so that it handles this condition.+
and %
to character class matching query string.%
to a: ~
.Edit 2011-10-12: YouTube URL host part may now have any subdomain (not just www.
).
Edit 2012-05-01: The consume URL section may now allow for '-'.
Edit 2013-08-23: Added additional format provided by @Mei. (The query part may have a .
dot.
Edit 2013-11-30: Added additional format provided by @CRONUS: youtube-nocookie.com
.
Edit 2016-01-25: Fixed regex to handle error case provided by CRONUS.
Here's a method I once wrote for a project that extracts YouTube and Vimeo video keys:
/** * strip important information out of any video link * * @param string link to a video on the hosters page * @return mixed FALSE on failure, array on success */ function getHostInfo ($vid_link) { // YouTube get video id if (strpos($vid_link, 'youtu')) { // Regular links if (preg_match('/(?<=v\=)([\w\d-_]+)/', $vid_link, $matches)) return array('host_name' => 'youtube', 'original_key' => $matches[0]); // Ajax hash tag links else if (preg_match('§([\d\w-_]+)$§i', $vid_link, $matches)) return array('host_name' => 'youtube', 'original_key' => $matches[0]); else return FALSE; } // Vimeo get video id elseif (strpos($vid_link, 'vimeo')) { if (preg_match('§(?<=/)([\d]+)§', $vid_link, $matches)) return array('host_name' => 'vimeo', 'original_key' => $matches[0]); else return FALSE; } else return FALSE; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With