I have this huge xml file which contains a lot of comments.
Whats the "best way" to strip out all the comments and nicely format the xml from the linux command line?
you can use tidy
$ tidy -quiet -asxml -xml -indent -wrap 1024 --hide-comments 1 tomcat-users.xml
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<user username="qwerty" password="ytrewq" roles="manager-gui" />
</tomcat-users>
Run your XML through an identity transform XSLT, with an empty template for comments.
All of the XML content, except for the comments, will be passed through to the output.
In order to niecely format the output, set the output @indent="yes":
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<!--Match on Attributes, Elements, text nodes, and Processing Instructions-->
<xsl:template match="@*| * | text() | processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--Empty template prevents comments from being copied into the output -->
<xsl:template match="comment()"/>
</xsl:stylesheet>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With