I'm making a regex so I can find youtube links (can be multiple) in a piece of HTML text posted by an user.
Currently I'm using the following regex to change 'http://www.youtube.com/watch?v=-JyZLS2IhkQ' into displaying the corresponding youtube video:
return re.compile('(http(s|):\/\/|)(www.|)youtube.(com|nl)\/watch\?v\=([a-zA-Z0-9-_=]+)').sub(tag, value)
(where the variable 'tag' is a bit of html so the video works and 'value' a user post)
Now this works.. until the url is like this:
'http://www.youtube.com/watch?v=-JyZLS2IhkQ&feature...'
Now I'm hoping you guys could help me figure how to also match the '&feature...' part so it disappears.
Example HTML:
No replies to this post..
Youtube vid:
http://www.youtube.com/watch?v=-JyZLS2IhkQ
More blabla
Thanks for your thoughts, much appreciated
Stefan
Here how I'm solving it:
import re
def youtube_url_validation(url):
youtube_regex = (
r'(https?://)?(www\.)?'
'(youtube|youtu|youtube-nocookie)\.(com|be)/'
'(watch\?v=|embed/|v/|.+\?v=)?([^&=%\?]{11})')
youtube_regex_match = re.match(youtube_regex, url)
if youtube_regex_match:
return youtube_regex_match
return youtube_regex_match
TESTS:
youtube_urls_test = [
'http://www.youtube.com/watch?v=5Y6HSHwhVlY',
'http://youtu.be/5Y6HSHwhVlY',
'http://www.youtube.com/embed/5Y6HSHwhVlY?rel=0" frameborder="0"',
'https://www.youtube-nocookie.com/v/5Y6HSHwhVlY?version=3&hl=en_US',
'http://www.youtube.com/',
'http://www.youtube.com/?feature=ytca']
for url in youtube_urls_test:
m = youtube_url_validation(url)
if m:
print('OK {}'.format(url))
print(m.groups())
print(m.group(6))
else:
print('FAIL {}'.format(url))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With