Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex extract vimeo id from url

embed_url = 'http://www.vimeo.com/52422837'
response = re.search(r'^(http://)?(www\.)?(vimeo\.com/)?([\/\d+])', embed_url)
return response.group(4)

The response is:

5

I was hoping for

52422837

Anybody an idea? I'm really bad with regexes :S

like image 683
Jeroen Gerits Avatar asked Mar 08 '13 14:03

Jeroen Gerits


2 Answers

Don't reinvent the wheel!

>>> import urlparse
>>> urlparse.urlparse('http://www.vimeo.com/52422837')
ParseResult(scheme='http', netloc='www.vimeo.com', path='/52422837', params='',
query='', fragment='')

>>> urlparse.urlparse('http://www.vimeo.com/52422837').path.lstrip("/")
'52422837'
like image 117
Colonel Panic Avatar answered Nov 09 '22 08:11

Colonel Panic


Use \d+ (no brackets) to match the literal slash + digits:

response = re.search(r'^(http://)?(www\.)?(vimeo\.com/)?(\d+)', embed_url)

Result:

>>> re.search(r'^(http://)?(www\.)?(vimeo\.com/)?(\d+)', embed_url).group(4)
'52422837'

You were using a character group ([...]) where none was needed. The pattern [\/\d+] matches exactly one of /, + or a digit.

like image 5
Martijn Pieters Avatar answered Nov 09 '22 07:11

Martijn Pieters