Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression replace a word by a link

I want to write a regular expression that will replace the word Paris by a link, for only the word is not ready a part of a link.

Example:

    i'm living <a href="Paris" atl="Paris link">in Paris</a>, near Paris <a href="gare">Gare du Nord</a>,  i love Paris.

would become

    i'm living.........near <a href="">Paris</a>..........i love <a href="">Paris</a>.
like image 817
AnhTu Avatar asked Jun 26 '26 16:06

AnhTu


1 Answers

This is hard to do in one step. Writing a single regex that does that is virtually impossible.

Try a two-step approach.

  1. Put a link around every "Paris" there is, regardless if there already is another link present.
  2. Find all incorrectly nested links (<a href="..."><a href="...">Paris</a></a>), and eliminate the inner link.

Regex for step one is dead-simple:

\bParis\b

Regex for step two is slightly more complex:

(<a[^>]+>.*?(?!:</a>))<a[^>]+>(Paris)</a>

Use that one on the whole string and replace it with the content of match groups 1 and 2, effectively removing the surplus inner link.

Explanation of regex #2 in plain words:

  • Find every link (<a[^>]+>), optionally followed by anything that is not itself followed by a closing link (.*?(?!:</a>)). Save it into match group 1.
  • Now look for the next link (<a[^>]+>). Make sure it is there, but do not save it.
  • Now look for the word Paris. Save it into match group 2.
  • Look for a closing link (</a>). Make sure it is there, but don't save it.
  • Replace everything with the content of groups 1 and 2, thereby losing everything you did not save.

The approach assumes these side conditions:

  • Your input HTML is not horribly broken.
  • Your regex flavor supports non-greedy quantifiers (.*?) and zero-width negative look-ahead assertions ((?!:...)).
  • You wrap the word "Paris" only in a link in step 1, no additional characters. Every "Paris" becomes "<a href"...">Paris</a>", or step two will fail (until you change the second regex).
  • BTW: regex #2 explicitly allows for constructs like this:

    <a href="">in the <b>capital of France</b>, <a href="">Paris</a></a>

    The surplus link comes from step one, replacement result of step 2 will be:

    <a href="">in the <b>capital of France</b>, Paris</a>

like image 83
Tomalak Avatar answered Jun 28 '26 10:06

Tomalak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!