I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this: <ul> <li>http://server/xyz/2008-10-08-4</li> <li>http://server/xyz/2008-10-08-4/</li> <li>http://server/xyz/2008-10-08-4/123/more</li> </ul> But not match something like this: <ul> <li>http://server/xyz/2008-10-08-4-1</li> </ul> So, I thought my best bet was something like this: <pre class="prettyprint"><code>/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$] </code></pre> where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

To match either / or end of content, use <code>(/|\z)</code> This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs). To put that with an updated version of what you had: <pre class="prettyprint"><code>/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z) </code></pre> Note that I've changed the start to be a non-greedy match for non-whitespace ( <code>\S+?</code> ) rather than matching anything and everything ( <code>.*</code> )

You've got a couple regexes now which will do what you want, so that's adequately covered. What hasn't been mentioned is why your attempt won't work: Inside a character class, <code>$</code> (as well as <code>^</code>, <code>.</code>, and <code>/</code>) has no special meaning, so <code>[/$]</code> matches either a literal <code>/</code> or a literal <code>$</code> rather than terminating the regex (<code>/</code>) or matching end-of-line (<code>$</code>).

Regex to match URL end-of-line or "/" character

Tags:

regex

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:

http://server/xyz/2008-10-08-4
http://server/xyz/2008-10-08-4/
http://server/xyz/2008-10-08-4/123/more

But not match something like this:

http://server/xyz/2008-10-08-4-1

So, I thought my best bet was something like this:

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]

where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

559

asked Oct 06 '08 16:10

Chris Farmer

2 Answers

To match either / or end of content, use (/|\z)

This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).

To put that with an updated version of what you had:

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)

Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

149

answered Oct 05 '22 16:10

Peter Boughton

You've got a couple regexes now which will do what you want, so that's adequately covered.

What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

answered Oct 05 '22 16:10

Dave Sherohman

Related questions
                            
                                Warning: preg_replace(): Unknown modifier 'g'
                            
                                In Javascript, how can I perform a global replace on string with a variable inside '/' and '/g'?
                            
                                Regex for Comma delimited list
                            
                                Python and regular expression with Unicode
                            
                                Split Ruby regex over multiple lines
                            
                                Chrome dev tools: any way to exclude requests whose URL matches a regex?
                            
                                Regex any ASCII character
                            
                                How can non-ASCII characters be removed from a string?
                            
                                JS regex to split by line
                            
                                regexes: How to access multiple matches of a group? [duplicate]
                            
                                How do I deal with special characters like \^$.?*|+()[{ in my regex?
                            
                                How do I return a string from a regex match in python? [duplicate]
                            
                                Using sed to delete all lines between two matching patterns
                            
                                Java regex to extract text between tags
                            
                                Escape dot in a regex range
                            
                                RegEx to split camelCase or TitleCase (advanced)
                            
                                How to match hyphens with Regular Expression?
                            
                                How to match a line not containing a word [duplicate]
                            
                                Regex Until But Not Including
                            
                                How do I match an entire string with a regex?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With