There's many regex's out there to match a URL. However, I'm trying to match URLs that do not appear anywhere within a <code><a></code> hyperlink tag (<code>HREF</code>, inner value, etc.). So NONE of the URLs in these should match: <pre class="prettyprint"> <a href="http://www.example.com/">something</a> <a href="http://www.example.com/">http://www.example2.com</a> <a href="http://www.example.com/">somethinghttp://www.example.com/test</a> </pre> Any URL outside of <code><a></a></code> should be matched. One approach I tried was to use a negative lookahead to see if the first <code><a></code> tag after the URL was an opening <code><a></code> or a closing <code></a></code>. If it is a closing <code></a></code> then the URL must be inside a hyperlink. I think this idea was okay, but the negative lookahead regex didn't work (or more accurately, the regex wasn't written correctly). Any tips are very appreciated.

I was looking for this answer as well and because nothing out there really worked like I wanted it too this is the regex that I created. Obviously since its a regex be aware that this is not a perfect solution. <pre class="prettyprint"><code>/(?!<a[^>]*>[^<])(((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))(?![^<]*<\/a>)/gi </code></pre> And the whole function to update html is: <pre class="prettyprint"><code>function linkifyWithRegex(input) { let html = input; let regx = /(?!<a[^>]*>[^<])(((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))(?![^<]*<\/a>)/gi; html = html.replace( regx, function (match) { return '<a href="' + match + '">' + match + "</a>"; } ); return html; } </code></pre>

You can do it in two steps instead of trying to come up with a single regular expression: <ol> <li>Blend out (replace with nothing) the HTML anchor part (the entire anchor tag: opening tag, content and closing tag). </li> <li>Match the URL</li> </ol> In Perl it could be: <pre class="prettyprint"><code>my $curLine = $_; #Do not change $_ if it is needed for something else. $curLine =~ /<a[^<]+<\/a>//g; #Remove all of HTML anchor tag, "<a", "</a>" and everything in between. if ( $curLine =~ /http:\/\//) { print "Matched an URL outside a HTML anchor !: $_\n"; } </code></pre>

Regular expression to find URLs not inside a hyperlink

Tags:

html

regex

url

There's many regex's out there to match a URL. However, I'm trying to match URLs that do not appear anywhere within a <a> hyperlink tag (HREF, inner value, etc.). So NONE of the URLs in these should match:

<a href="http://www.example.com/">something</a>
<a href="http://www.example.com/">http://www.example2.com</a>
<a href="http://www.example.com/"><b>something</b>http://www.example.com/<span>test</span></a>

Any URL outside of <a></a> should be matched.

One approach I tried was to use a negative lookahead to see if the first <a> tag after the URL was an opening <a> or a closing </a>. If it is a closing </a> then the URL must be inside a hyperlink. I think this idea was okay, but the negative lookahead regex didn't work (or more accurately, the regex wasn't written correctly). Any tips are very appreciated.

238

asked Aug 22 '09 09:08

Ben Amada

2 Answers

I was looking for this answer as well and because nothing out there really worked like I wanted it too this is the regex that I created. Obviously since its a regex be aware that this is not a perfect solution.

/(?!<a[^>]*>[^<])(((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))(?![^<]*<\/a>)/gi

And the whole function to update html is:

function linkifyWithRegex(input) {
  let html = input;
  let regx = /(?!<a[^>]*>[^<])(((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))(?![^<]*<\/a>)/gi;
  html = html.replace(
    regx,
    function (match) {
      return '<a href="' + match + '">' + match + "</a>";
    }
  );
  return html;
}

answered Oct 23 '22 10:10

jackc

You can do it in two steps instead of trying to come up with a single regular expression:

Blend out (replace with nothing) the HTML anchor part (the entire anchor tag: opening tag, content and closing tag).
Match the URL

In Perl it could be:

my $curLine = $_; #Do not change $_ if it is needed for something else.
$curLine =~ /<a[^<]+<\/a>//g; #Remove all of HTML anchor tag, "<a", "</a>" and everything in between.
if ( $curLine =~ /http:\/\//)
{
  print "Matched an URL outside a HTML anchor !: $_\n";
}

answered Oct 23 '22 10:10

Peter Mortensen

Related questions
                            
                                Infinite scroll bar is not working with django
                            
                                HTML load one script after another script completes execution
                            
                                Why does Chrome think that my form is a credit card form?
                            
                                Content Editable Loses Its Tab/ New Line on Refresh
                            
                                Canvas device orientation scaling
                            
                                overflow-wrap: break-word vs. word-break: break-word
                            
                                Infinite 360 deg 3d rotation of an isometric shape with nested hover animation
                            
                                How to get Profile info from Google Signin with redirect mode (no-popup)?
                            
                                Creating navbar with CSS grid
                            
                                Fabric JS line height and char spacing on individual character
                            
                                Polygon border / border radius of a button
                            
                                Vaadin 14 show simple HTML Page
                            
                                Make the <a> tag open two URLs (in two different pages)
                            
                                How to create layout that same with layout in this picture in CSS
                            
                                Web-page template where content takes full height of view-port if has 1 line minus footer
                            
                                How to change font color of disabled input in Firefox
                            
                                Web displays: Paging vs. long tables
                            
                                Avoid an Element from being cut off when they are inside a "overflow: hidden" element
                            
                                How to use HTML anchors as a table of contents in email when rendered in clients like Groupwise or Gmail?
                            
                                Style unordered list to display list items in rounded corner box

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With