Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Substring the second last value between two slashes of a url string

I have a string like this:

http://www.example.com/value/1234/different-value

How can I extract the 1234?

Note: There may be a slash at the end:

http://www.example.com/value/1234/different-value
http://www.example.com/value/1234/different-value/
like image 502
Daxon Avatar asked Dec 01 '11 17:12

Daxon


2 Answers

/([^/]+)(?=/[^/]+/?$)

should work. You might need to format it differently according to the language you're using. For example, in Ruby, it's

if subject =~ /\/([^\/]+)(?=\/[^\/]+\/?\Z)/
    match = $~[1]
else
    match = ""
end
like image 105
Tim Pietzcker Avatar answered Oct 05 '22 16:10

Tim Pietzcker


Use Slice for Positional Extraction

If you always want to extract the 4th element (including the scheme) from a URI, and are confident that your data is regular, you can use Array#slice as follows.

'http://www.example.com/value/1234/different-value'.split('/').slice 4
#=> "1234"

'http://www.example.com/value/1234/different-value/'.split('/').slice 4
#=> "1234"

This will work reliably whether there's a trailing slash or not, whether or not you have more than 4 elements after the split, and whether or not that fourth element is always strictly numeric. It works because it's based on the element's position within the path, rather than on the contents of the element. However, you will end up with nil if you attempt to parse a URI with fewer elements such as http://www.example.com/1234/.

Use Scan/Match for Pattern Extraction

Alternatively, if you know that the element you're looking for is always the only one composed entirely of digits, you can use String#match with look-arounds to extract just the numeric portion of the string.

'http://www.example.com/value/1234/different-value'.match %r{(?<=/)\d+(?=/)}
#=> #<MatchData "1234">

$&
#=> "1234"

The look-behind and look-ahead assertions are needed to anchor the expression to a path. Without them, you'll match things like w3.example.com too. This solution is a better approach if the position of the target element may change, and if you can guarantee that your element of interest will be the only one that matches the anchored regex.

If there will be more than one match (e.g. http://www.example.com/1234/5678/) then you might want to use String#scan instead to select the first or last match. This is one of those "know your data" things; if you have irregular data, then regular expressions aren't always the best choice.

like image 21
Todd A. Jacobs Avatar answered Oct 05 '22 16:10

Todd A. Jacobs