I've written a ruby youtube url parser. It's designed to take an input of a youtube url of one of the following structures (these are currently the youtube url structures that I could find, maybe there's more?):
http://youtu.be/sGE4HMvDe-Q
http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1
The aim is to save just the id of the clip or playlist so that it can be embedded, so if it's a clip: 'sGE4HMvDe-Q'
, or if it's a playlist: 'p/A0C3C1D163BE880A'
The parser I wrote works fine for these urls, but seems a bit brittle and long-winded, I'm just wondering if someone could suggest a nicer ruby approach to this problem?
def parse_youtube
a = url.split('//').last.split('/')
b = a.last.split('watch?v=').last.split('?').first.split('&').first
if a[1] == 'p'
url = "p/#{b}"
else
url = b
end
end
URL parsing is a function of traffic management and load-balancing products that scan URLs to determine how to forward traffic across different links or into different servers. A URL includes a protocol identifier (http, for Web traffic) and a resource name, such as www.microsoft.com.
Locate a URL using a browser on a computerIn your browser, open YouTube. Find and click the video whose URL you want to see. The URL of the video will be in the address bar.
To use youtu.be manually, simply take a URL like http://www.youtube.com/watch?v=FdeioVndUhs and replace the “http://www.youtube.com/watch?v=” with “http://youtu.be/” to get: http://youtu.be/FdeioVndUhs Plug that shorter URL into a browser, and you'll see it redirects to that video.
def parse_youtube url
regex = /(?:.be\/|\/watch\?v=|\/(?=p\/))([\w\/\-]+)/
url.match(regex)[1]
end
urls = %w[http://youtu.be/sGE4HMvDe-Q
http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]
urls.each {|url| puts parse_youtube url }
# sGE4HMvDe-Q
# Lp7E973zozc
# p/A0C3C1D163BE880A
Depending on how you use this, you might want a better validation that the URL is indeed from youtube.
UPDATE:
Coming back to this a few years later. I've always been annoyed by how sloppy the original answer was. Since the validity of the Youtube domain wasn't validated anyway, I've removed some of the slop.
NODE EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
be 'be'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
watch 'watch'
--------------------------------------------------------------------------------
\? '?'
--------------------------------------------------------------------------------
v= 'v='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
p 'p'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[\w\/\-]+ any character of: word characters (a-z,
A-Z, 0-9, _), '\/', '\-' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
Using the Addressable gem, you can save yourself some work. There's also a URI module in stdlib, but Addressable is more powerful.
require 'addressable/uri'
uri = Addressable::URI.parse(youtube_url)
if uri.path == "/watch"
uri.query_values["v"] if uri.query_values
else
uri.path
end
EDIT | Removed madness. Didn't notice Addressable provides #query_values
already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With