I have a paragraph of text which may contain some links in plain text, or some links which are actually links.
For example:
Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: <a href="http://test.com/test.html">http://test.com/test.html</a>
I need to fish out the unformatted links from this piece of text. So any regular expression that will match the first case, but not the second or third case because they are already well formatted links.
I've managed to fish out all the links with this regex: ((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+)
, however, am still having trouble distinguishing between the cases.
This needs to be in javascript so I don't think negative lookbehind is allowed.
Any help would be appreciated.
EDIT: I'm trying to wrap the fished out unformatted links in an a tag.
Basic Usage Additional Options Callback Example Running the tests autolink-js is a small (about half a kilobyte), simple, and tested JavaScript tool that takes a string of text, finds URLs within it, and hyperlinks them. Why bother releasing such a tiny little method?
We should consider many cases in url detection, so the regex should be more complicated. – Kang Andrew Jul 8 '19 at 5:56 Add a comment | Your Answer Thanks for contributing an answer to Stack Overflow!
All it does is go through a paragraph of text and check if there is a match for the Regex (our presumed URL). If it finds a match, it puts any text before the link into an array indicating that it is of type text, while putting in the link as a type of link (deleting the HTTP or HTTPs for consistency sake).
Some text make contain links in plain text and this method would replace such links into clickable hyperlinks by adding the anchor tag. Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.
You can use this regex to get URLs outside of tags:
(?![^<]*>|[^<>]*<\/)((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+)
See demo
We can shorten it a bit, too, with an i
option:
(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)
See another demo
Sample code:
var re = /(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi;
var str = 'Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: <a href="http://test.com/test.html">http://test.com/test.html</a>';
var val = re.exec(str);
document.getElementById("res").innerHTML = "<b>URL Found</b>: " + val[1];
var subst = '<a href="$1">$1</a>';
var result = str.replace(re, subst);
document.getElementById("res").innerHTML += "<br><b>Replacement Result</b>: " + result;
<div id="res"/>
Update:
To allow capturing inside specific tags, you can whitelist them like this:
var re = /(?![^<]*>|[^<>]*<\/(?!(?:p|pre)>))((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With