Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add matching start tag in HTML

I have html content which looks like

<body>Hello world</div><div>New day</div></body>

I would like to parse this html snippet and add a starting div tag before Hello. What is the approach I could follow? I tried to use HTMLCLeaner but it didnt help Basically what this means is find ending div tags without matching start div tags and add them.

like image 605
Thunderhashy Avatar asked Nov 02 '22 03:11

Thunderhashy


1 Answers

If you use java try using Jsoup. Something like

Jsoup.clean("<body><div>Hello world</div><div>New day</div></body>", Whitelist.relaxed());

This will give you the proper output string.

UPDATE

You can use Jsoup.parse(html) which returns a Document on which you can call toString() to get the fixed html which will include all the html and body tags as well. It will give you the following output for you html.

   <html>
    <head></head>
    <body>
      <div>
        Hello world
      </div
      <div>
        New day
      </div>
    </body>
   </html>

As you said most of the parser will fix the end tags but not start tags as they can not decide on where to start the start tags except just before the wrong end tag and it will be useless to add the start tag there just before the end tag.

You may need to implement you own logic for that as Trevor Hutto's suggestion (Stack based approach) bellow but it will have its own complications depends on your requirement.

like image 136
RP- Avatar answered Nov 14 '22 04:11

RP-