Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip all tags from an XML file

I have a large XML file I would like to strip all the tags and leave node values only. I want each node value in a separate line. How can I do that?

Can I use a free software to do it or use a PHP or ASP.NET Code. I looked at XSLT option also. It is probably too much for RegEX. I explored PHP options looked at simplexml_load_file(), strip_tags(), get_file_contents() but failed.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- a comment -->
<catalog>
    <cd>
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <country>USA</country>
        <company>Columbia</company>
        <price>10.90</price>
                <address>
                         <city>Melbourne </city>
                         <zip>01803 </zip>
                </address>
        <year>1985</year>
    </cd>
    <cd>
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <country>UK</country>
        <company>CBS Records</company>
        <price>9.90</price>
        <year>1988</year>
    </cd>

</catalog>

Edit: This is what I tried, among other things.

<?php

$xml = simplexml_load_file('myxml.xml');
echo strip_tags($xml);

?>
like image 466
Hammad Khan Avatar asked Jan 17 '23 16:01

Hammad Khan


1 Answers

This should do ya:

<?php
$xml = file_get_contents('myxml.xml');
$xml = nl2br($xml);
echo strip_tags($xml,"<br>");
?>

The reason you were missing line breaks was because in XML, it is stored as plaintext linebreaks \n whereas when displaying as HTML you must have explicit <br> linebreaks. Because of this the good PHP folks made a handy function called nl2br() to do this for you.

like image 57
Connor Peet Avatar answered Jan 25 '23 15:01

Connor Peet