Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing html with Jsoup and removing spans with certain style

I'm writing an app for a friend but I ran into a problem, the website has these

<span style="display:none">&amp;0000000000000217000000</span>

And we have no idea even what they are, but I need them removed because my app is outputting their value.

Is there any way I can check to see if this is in the Elements and remove it? I have a for-each loop parsing however I cant figure out how to effectively remove this element.

thanks

like image 423
Samuel Avatar asked Dec 28 '22 00:12

Samuel


1 Answers

If you want to remove those spans completely based on the style attribute, try this code:

String html = "<span style=\"display:none\">&amp;0000000000000217000000</span>";
html += "<span style=\"display:none\">&amp;1111111111111111111111111</span>";
html += "<p>Test paragraph should not be removed</p>";

Document doc = Jsoup.parse(html);

doc.select("span[style*=display:none]").remove();

System.out.println(doc);

Here is the output:

<html>
 <head></head>
 <body>
  <p>Test paragraph should not be removed</p>
 </body>
</html>
like image 104
B. Anderson Avatar answered Dec 29 '22 16:12

B. Anderson