I'm outputting html that's all crushed together, and would like to convert it to have proper indentation. I've been trying to use xmllint for this, but with no joy. E.g. when this is in file.html:
<table><tr><td><b>Foo</b></td></tr></table>
<table><tr><td>Bar</td></tr></table>
I get:
$ xmllint --format file.html
file.html:2: parser error : Extra content at the end of the document
<table><tr><td>Bar</td></tr></table>
^
<<< exit status [1] >>>
But when file.html contains either of those lines alone, it works fine (removing the second line):
$ xmllint --format file.html
<?xml version="1.0"?>
<table>
<tr>
<td>
<b>Foo</b>
</td>
</tr>
</table>
When i inlcude the --html option, it's more likely to run without errors, but then it doesn't indent.
Any suggestions? Are there any other (*nix) tools I can use for this? Thanks ...
As user 4M01 suggested: On the command line, append the pipe with a call to HTML tidy.
HTML output from xmllint will be repaired; tidy will wrap some reasonable ... around your html fragment.
xmllint --xpath "//tr[6]/td[7]" --html - | tidy -q
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With