I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing the first match.
How I can replace all the URL? I guess I should be using the exec command, but I did not really figure how to do it.
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/i;
return text.replace(exp,"<a href='$1'>$1</a>");
}
First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.
There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum
) vs. nonexistent (.etc
) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.
I've looked at a ton of libraries, and there are a few worth using despite some downsides:
href
attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.Libraries that I've disqualified quickly for this task:
If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.
I've made some small modifications to Travis's code (just to avoid any unnecessary redeclaration - but it's working great for my needs, so nice job!):
function linkify(inputText) {
var replacedText, replacePattern1, replacePattern2, replacePattern3;
//URLs starting with http://, https://, or ftp://
replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;
replacedText = inputText.replace(replacePattern1, '<a href="$1" target="_blank">$1</a>');
//URLs starting with "www." (without // before it, or it'd re-link the ones done above).
replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
replacedText = replacedText.replace(replacePattern2, '$1<a href="http://$2" target="_blank">$2</a>');
//Change email addresses to mailto:: links.
replacePattern3 = /(([a-zA-Z0-9\-\_\.])+@[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)/gim;
replacedText = replacedText.replace(replacePattern3, '<a href="mailto:$1">$1</a>');
return replacedText;
}
The regular expression in the question misses a lot of edge cases. When detecting URLs, it's always better to use a specialized library that handles international domain names, new TLDs like .museum
, parentheses and other punctuation within and at the end of the URL, and many other edge cases. See the Jeff Atwood's blog post The Problem With URLs for an explanation of some of the other issues.
The best summary of URL matching libraries is in Dan Dascalescu's Answer
(as of Feb 2014)
Add a "g" to the end of the regular expression to enable global matching:
/ig;
But that only fixes the problem in the question where the regular expression was only replacing the first match. Do not use that code.
/**
* Convert URLs in a string to anchor buttons
* @param {!string} string
* @returns {!string}
*/
function URLify(string){
var urls = string.match(/(((ftp|https?):\/\/)[\-\w@:%_\+.~#?,&\/\/=]+)/g);
if (urls) {
urls.forEach(function (url) {
string = string.replace(url, '<a target="_blank" href="' + url + '">' + url + "</a>");
});
}
return string.replace("(", "<br/>(");
}
simple example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With