Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write a javascript regular expression to replace hyperlinks in this format [*](*) with html hyperlinks?

I need the parse text with links in the following formats:

[html title](http://www.htmlpage.com)
http://www.htmlpage.com
http://i.imgur.com/OgQ9Uaf.jpg

The output for those two strings would be:

<a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>
<a href='http://i.imgur.com/OgQ9Uaf.jpg'>http://i.imgur.com/OgQ9Uaf.jpg</a>

The string could include an arbitrary amount of these links, ie:

[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com

output:

<a href='http://www.htmlpage.com'>html title</a><a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a>    <a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a> wejwelfj <a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>

I have an extremely long function that does an alright job by passing over the string 3 times, but I can't successfully parse this string:

[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something.

For brevity, I'll post the regular expressions I've tried rather than the entire find/replace function:

var matchArray2 = inString.match(/\[.*\]\(.*\)/g);

for matching [*](*), doesn't work because []()[]() is matched

Really that's it, I guess. Once I make that match I search that match for () and [] to parse out the link an link text and build the href tag. I delete matches from a temp string so I don't match them when I do my second pass to find plain hyperlinks:

var plainLinkArray = tempString2.match(/http\S*:\/\/\S*/g);

I'm not parsing any html with regex. I'm parsing a string and attempting to output html.

edit: I added the requirement that it parse the third link http://i.imgur.com/OgQ9Uaf.jpg after the fact.

my final solution (based on @Cerbrus's answer):

function parseAndHandleHyperlinks(inString)
{
    var result = inString.replace(/\[(.+?)\]\((https?:\/\/.+?)\)/g, '<a href="$2">$1</a>');
    return result.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');     
}
like image 875
BrennanR Avatar asked Jan 30 '13 07:01

BrennanR


2 Answers

Try this regex:

/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g

var s = "[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)\n\
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com";

s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>');

Regex Explanation:

# /                   - Regex Start
# \[                  - a `[` character (escaped)
# (.+?)               - Followed by any amount of words, grouped, non-greedy, so it won't match past:
# \]                  - a `]` character (escaped)
# \(                  - Followed by a `(` character (escaped)
# (https?:\/\/
#   [a-zA-Z0-9/.(]+?) - Followed by a string that starts with `http://` or `https://`
# \)                  - Followed by a `)` character (escaped)
# /g                  - End of the regex, search globally.

Now the 2 strings in the () / [] are captured, and placed in the following string:

'<a href="$2">$1</a>';

This works for your "problematic" string:

var s = "[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something."
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

// Result:

'<a href="http://i.imgur.com/iIlhrEu.jpg">This</a> one got me crying first, then once the floodgates were opened <a href="http://i.imgur.com/IwSNFVD.jpg">this</a> one did it again and <a href="http://i.imgur.com/hxIwPKJ.jpg">this</a>. Ugh, feels. Gotta go hug someone/something.'

Some more examples with "Incorrect" input:

var s = "[Th][][is](http://x.com)\n\
    [this](http://x(.com)\n\
    [this](http://x).com)"
s.replace(/\[(.+?)\]\((https?:\/\/[a-zA-Z0-9/.(]+?)\)/g, '<a href="$2">$1</a>')

//   "<a href="http://x.com">Th][][is</a>
//    <a href="http://x(.com">this</a>
//    <a href="http://x">this</a>.com)"

You can't really blame the last line for breaking, since there's no way to know if the user meant to stop the url there, or not.

To catch loose urls, add this:

.replace(/(?: |^)(https?\:\/\/[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');

The (?: |^) bit catches a String start or space character, so it'll also match lines starting with a url.

like image 181
Cerbrus Avatar answered Sep 29 '22 01:09

Cerbrus


str.replace(/\[(.*?)\]\((.*?)\)/gi, '<a href="$2">$1</a>');

This assumes that there are no errant brackets in the string or parentheses in the URL.

Then:

str.replace(/(\s|^)(https?:\/\/.*?)(?=\s|$)/gi, '$1<a href="$2">$2</a>')

This matches an "http"-like URL that is not immediately preceded by a " (which would have just been added by the previous replacement). Feel free to use a better expression if you have it, of course.

EDIT: I edited the answer because I did not realize that JS did not have lookbehind syntax. Instead, you can see that the expression matches any space or the beginning of the line to match plain http links. The captured space has to be put back (hence the $1). A lookahead at the end is done to ensure that everything up to the next space (or end of the expression) is captured. If space is not a good boundary for you, you will have to come up with a better one.

like image 45
Explosion Pills Avatar answered Sep 29 '22 01:09

Explosion Pills