Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract slug from URL with regular expression in Python?

Tags:

python

regex

I'm struggling with Python's re. I don't know how to solve the following problem in a clean way.

I want to extract a part of an URL,

What I tried so far:

url = http://www.example.com/this-2-me-4/123456-subj
m = re.search('/[0-9]+-', url)
m = m.group(0).rstrip('-')
m = m.lstrip('/')

This leaves me with the desired output 123456, but I feel this is not the proper way to extract the slug.

How can I solve this quicker and cleaner?

like image 207
mcbetz Avatar asked Mar 20 '23 10:03

mcbetz


1 Answers

Use a capturing group by putting parentheses around the part of the regex that you want to capture (...). You can get the contents of a capturing group by passing in its number as an argument to m.group():

>>> m = re.search('/([0-9]+)-', url)
>>> m.group(1) 
123456

From the docs:

(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(] [)].

like image 173
Stefan van den Akker Avatar answered Apr 06 '23 12:04

Stefan van den Akker