Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete specific html class with content using Java Html Class

Tags:

java

html

regex

Recently I am working on a android project. I am parsing data from wordpress api. But detail post content are in html formet. I have to remove html tags. Using Html.fromHtml().toString() java method I deleted all tags. But there are some image caption which I have to delete. For delete the caption I have to find tag class. So how can I delete this content using Html Class?

<p class="wp-caption-text">android m marshmallow</

EDIT :

Using regular Expression I solved My problem.

Insert Your specific Html in Regex and you will get your Regular Expression.

 yourHtml = yourHtml.replaceAll("Your_Regular_Expression","");
 yourHtml = Html.fromHtml(yourHtml).toString();
like image 495
Yeahia2508 Avatar asked Aug 20 '15 16:08

Yeahia2508


1 Answers

If you want to get a match you can try this:

<(\w+).*?class="wp-caption-text".*?>[\s\S]*?<\/\1>

Regex101

I'd like to mention that this is not a perfect solution. Regular expressions are not very good at parsing html since the structures in that markup language are actually too complex to 100% be parseable by regular expressions. See here

like image 87
d0nut Avatar answered Oct 02 '22 14:10

d0nut