Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to find url in a text

I have to find the first url in the text with a regular expression:

for example:

I love this website:http://www.youtube.com/music it's fantastic

or

[ es. http://www.youtube.com/music] text
like image 349
M4rk Avatar asked Mar 28 '11 15:03

M4rk


4 Answers

I looked into this issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.

My regex solution, written for both PHP and Javascript - is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:

The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber

The comments following Jeff's blog post are a must read if you want to do this right...

Note that this question gets asked a lot. Maybe do a search next time :)

like image 91
ridgerunner Avatar answered Sep 16 '22 16:09

ridgerunner


You can't do this perfectly with a regular expression. You may be interested in this blog post. There is a bit more information on Regex Guru, but even those look very fragile. You will need to have additional checks outside of your regular expression to catch the edge cases.

like image 26
asthasr Avatar answered Sep 20 '22 16:09

asthasr


Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlize filter in the Python-based web framework Django. I therefore ported Django's urlize function to JavaScript: https://github.com/ljosa/urlize.js

It actually would not pick up the URL in your example because there is a colon right before the URL. But if we modify the example a little:

urlize("I love this website: http://www.youtube.com/music it's fantastic", true, true)
=> 'I love this website: <a href="http://www.youtube.com/music" rel="nofollow">http://www.youtube.com/music</a> it&#39;s fantastic"'

Note the second argument which, if true, inserts rel="nofollow" and the third argument which, if true, quotes characters that have special meaning in HTML.

like image 33
Vebjorn Ljosa Avatar answered Sep 20 '22 16:09

Vebjorn Ljosa


This might work->

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Found it somewhere

Will find links ->

http://foo.com/blah_blah/

(Something like http://foo.com/blah_blah)

http://foo.com/blah_blah_(wikipedia)

Hope this works....

like image 45
Shashank Agarwal Avatar answered Sep 19 '22 16:09

Shashank Agarwal