Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to get a filename from a url

Tags:

regex

I am trying to write a regex to get the filename from a url if it exists.

This is what I have so far:

(?:[^/][\d\w\.]+)+$

So from the url http://www.foo.com/bar/baz/filename.jpg, I should match filename.jpg

Unfortunately, I match anything after the last /.

How can I tighten it up so it only grabs it if it looks like a filename?

like image 640
shenku Avatar asked Jan 23 '13 05:01

shenku


People also ask

What is the file name in a URL?

The filename is the last part of the URL from the last trailing slash. For example, if the URL is http://www.example.com/dir/file.html then file. html is the file name.

What is URL regex?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.


5 Answers

This one works well for me.

(\w+)(\.\w+)+(?!.*(\w+)(\.\w+)+)
like image 38
deleter1 Avatar answered Oct 19 '22 22:10

deleter1


The examples above fails to get file name "file-1.name.zip" from this URL:

"http://sub.domain.com/sub/sub/handler?file=data/file-1.name.zip&v=1"

So I created my REGEX version:

[^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$))

Explanation:

[^/\\&\?]+          # file name - group of chars without URL delimiters
\.\w{3,4}           # file extension - 3 or 4 word chars
(?=([\?&].*$|$))    # positive lookahead to ensure that file name is at the end of string or there is some QueryString parameters, that needs to be ignored
like image 67
Janeks Malinovskis Avatar answered Oct 19 '22 23:10

Janeks Malinovskis


(?:.+\/)(.+)

Select all up to the last forward slash (/), capture everything after this forward slash. Use subpattern $1.

like image 13
yolo Avatar answered Oct 19 '22 23:10

yolo


Non Pcre

(?:[^/][\d\w\.]+)$(?<=\.\w{3,4})

Pcre

(?:[^/][\d\w\.]+)$(?<=(?:.jpg)|(?:.pdf)|(?:.gif)|(?:.jpeg)|(more_extension))

Demo

Since you test using regexpal.com that is based on javascript(doesnt support lookbehind), try this instead

(?=\w+\.\w{3,4}$).+
like image 11
slier Avatar answered Oct 19 '22 23:10

slier


I'm using this:

(?<=\/)[^\/\?#]+(?=[^\/]*$)

Explanation:

(?<=): positive look behind, asserting that a string has this expression, but not matching it.

(?<=/): positive look behind for the literal forward slash "/", meaning I'm looking for an expression which is preceded, but does not match a forward slash.

[^/\?#]+: one or more characters which are not either "/", "?" or "#", stripping search params and hash.

(?=[^/]*$): positive look ahead for anything not matching a slash, then matching the line ending. This is to ensure that the last forward slash segment is selected.

Example usage:

const urlFileNameRegEx = /(?<=\/)[^\/\?#]+(?=[^\/]*$)/;

const testCases = [
  "https://developer.mozilla.org/en-US/docs/Web/API/MutationObserverInit#yo",
  "https://developer.mozilla.org/static/fonts/locales/ZillaSlab-Regular.subset.bbc33fb47cf6.woff2",
  "https://developer.mozilla.org/static/build/styles/locale-en-US.520ecdcaef8c.css?is-nice=true"
];

testCases.forEach(testStr => console.log(`The file of ${testStr} is ${urlFileNameRegEx.exec(testStr)[0]}`))
like image 8
deckele Avatar answered Oct 19 '22 22:10

deckele