I have html content which looks like
<body>Hello world</div><div>New day</div></body>
I would like to parse this html snippet and add a starting div tag before Hello. What is the approach I could follow? I tried to use HTMLCLeaner but it didnt help Basically what this means is find ending div tags without matching start div tags and add them.
If you use java try using Jsoup. Something like
Jsoup.clean("<body><div>Hello world</div><div>New day</div></body>", Whitelist.relaxed());
This will give you the proper output string.
UPDATE
You can use Jsoup.parse(html)
which returns a Document
on which you can call toString()
to get the fixed html which will include all the html
and body
tags as well. It will give you the following output for you html.
<html>
<head></head>
<body>
<div>
Hello world
</div
<div>
New day
</div>
</body>
</html>
As you said most of the parser will fix the end tags but not start tags as they can not decide on where to start the start tags except just before the wrong end tag and it will be useless to add the start tag there just before the end tag.
You may need to implement you own logic for that as Trevor Hutto's suggestion (Stack based approach) bellow but it will have its own complications depends on your requirement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With