Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript Regex to match a URL in a field of text

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this

 var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?"   var txtfield = $('#msg').val() /*this is a textarea*/   if ( urlpattern.test(txtfield) ){         //do something about it  } 

EDIT:

So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error

  "Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[w-@?^=%&/~+#])?/: Range out of order in character class" 

for the following code:

var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?' ); 
like image 286
BillPull Avatar asked Nov 18 '11 20:11

BillPull


People also ask

How do you find a link in a string?

Use a Regex to Find URLs in a String In the function, we refine the urlRegex variable that has the regex for matching URLs. We check for http or https . And we look for slashes and text after that. The g flag at the end of the regex lets us search for all URLs in the string.

What is \b in regex JavaScript?

The RegExp \B Metacharacter in JavaScript is used to find a match which is not present at the beginning or end of a word. If a match is found it returns the word else it returns NULL. Example 1: This example matches the word “for” which is not present at the beginning or end of the word.


2 Answers

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

In addition, \+ and \@ in a character class are indeed interpreted as + and @ respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

I would recommend the following regex for your purposes:

(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])? 

this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?") 

or by directly specifying a regex literal, using the // quoting method:

var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/ 

The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.

I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!

Edit

As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])?  #------changed----here-------------^ 

<End Edit>

Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

like image 65
Code Jockey Avatar answered Sep 21 '22 17:09

Code Jockey


Complete Multi URL Pattern.

UPDATED: Nov. 2020, April & June 2021 (Thanks commenters)

Matches all URI or URL in a string! Also extracts the protocol, domain, path, query and hash. ([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)

https://regex101.com/r/jO8bC4/56

Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)

var re = /([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)/mig; var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)'; var m;  while ((m = re.exec(str)) !== null) {     if (m.index === re.lastIndex) {         re.lastIndex++;     }     console.log(m); } 

Will give you the following:

["https://www.facebook.com",   "https://",   "www.facebook.com",   "",   "",   "" ]  ["https://github.com/justsml?tab=activity#top",   "https://",   "github.com",   "/justsml",   "tab=activity",   "top" ] 
like image 43
Dan Levy Avatar answered Sep 24 '22 17:09

Dan Levy