I want to replace some elements in HTML files, keeping all the other content unchanged.
Document doc = Jsoup.parse("<div id=title>Old</div >\n" +
"<p>1<p>2\n" +
"<table><tr><td>1</td></tr></table>");
doc.getElementById("title").text("New");
System.out.println(doc.toString());
I expect to have the following output:
<div id=title>New</span></div >
<p>1<p>2
<table><tr><td>1</td></tr></table>
Instead, I have:
<html>
<head></head>
<body>
<div id="title">New</div>
<p>1</p>
<p>2 </p>
<table>
<tbody>
<tr>
<td>1</td>
</tr>
</tbody>
</table>
</body>
</html>
Jsoup added:
Can I serialise modified HTML back to original? Jericho does that but it doesn’t provide slick DOM manipulation methods as Jsoup does.
Is there a reason why attribute values shouldn't get quoted? See here and here.
For the other points try this:
final String html = "<div id=title>Old</div >\n"
+ "<p>1<p>2\n"
+ "<table><tr><td>1</td></tr></table>";
Document doc = Jsoup.parse(html);
doc.select("[id=title]").first().text("New");
doc.select("body, head, html, tbody").unwrap();
doc.outputSettings().prettyPrint(false);
System.out.println(doc);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With