Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Auto-link URL with javascript Regex

I have a paragraph of text which may contain some links in plain text, or some links which are actually links.

For example:

Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: <a href="http://test.com/test.html">http://test.com/test.html</a>

I need to fish out the unformatted links from this piece of text. So any regular expression that will match the first case, but not the second or third case because they are already well formatted links.

I've managed to fish out all the links with this regex: ((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+), however, am still having trouble distinguishing between the cases.

This needs to be in javascript so I don't think negative lookbehind is allowed.

Any help would be appreciated.

EDIT: I'm trying to wrap the fished out unformatted links in an a tag.

like image 292
l3utterfly Avatar asked Apr 20 '15 12:04

l3utterfly


People also ask

What is Autolink-JS?

Basic Usage Additional Options Callback Example Running the tests autolink-js is a small (about half a kilobyte), simple, and tested JavaScript tool that takes a string of text, finds URLs within it, and hyperlinks them. Why bother releasing such a tiny little method?

Should regex be more complicated for URL detection?

We should consider many cases in url detection, so the regex should be more complicated. – Kang Andrew Jul 8 '19 at 5:56 Add a comment | Your Answer Thanks for contributing an answer to Stack Overflow!

How does regex work?

All it does is go through a paragraph of text and check if there is a match for the Regex (our presumed URL). If it finds a match, it puts any text before the link into an array indicating that it is of type text, while putting in the link as a type of link (deleting the HTTP or HTTPs for consistency sake).

How to replace plain text links with hyperlinks?

Some text make contain links in plain text and this method would replace such links into clickable hyperlinks by adding the anchor tag. Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.


1 Answers

You can use this regex to get URLs outside of tags:

(?![^<]*>|[^<>]*<\/)((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+)

See demo

We can shorten it a bit, too, with an i option:

(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)

See another demo

Sample code:

var re = /(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi; 
var str = 'Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: <a href="http://test.com/test.html">http://test.com/test.html</a>';
var val = re.exec(str);
document.getElementById("res").innerHTML = "<b>URL Found</b>: " + val[1];
var subst = '<a href="$1">$1</a>'; 
var result = str.replace(re, subst);
document.getElementById("res").innerHTML += "<br><b>Replacement Result</b>: " + result;
<div id="res"/>

Update:

To allow capturing inside specific tags, you can whitelist them like this:

var re = /(?![^<]*>|[^<>]*<\/(?!(?:p|pre)>))((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi;
like image 191
Wiktor Stribiżew Avatar answered Oct 25 '22 06:10

Wiktor Stribiżew