Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing <a href > tag using regex

Tags:

html

regex

I want to extract the plain text from given HTML code. I tried using regex and got

String target = val.replaceAll("<a.*</a>", "");.

My main requirement is I want remove everything between <a> and </a> (including the Link name). While using the above code all other contents also removed.

<a href="www.google.com">Google</a> This is a Google Link

<a href="www.yahoo.com">Yahoo</a> This is a Yahoo Link

Here I want to remove the values between <a> and </a>. Final output should

This is a Google Link This is a Yahoo Link

like image 465
Sathesh S Avatar asked Jan 01 '14 10:01

Sathesh S


1 Answers

Use a non-greedy quantifier (*?). For example, to remove the link entirely:

String target = val.replaceAll("<a.*?</a>", "");

Or to replace the link with just the link tag's contents:

String target = val.replaceAll("<a[^>]*>(.*?)</a>", "This is a $1 Link");

However, I would still recommend using a proper DOM manipulation API.

like image 148
p.s.w.g Avatar answered Nov 11 '22 23:11

p.s.w.g