Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse XML for <![CDATA[]]>

How to parse a XML having data included in <![CDATA[---]... how can we parse the xml and get the data included in CDATA ???

like image 540
GOK Avatar asked Dec 13 '11 12:12

GOK


People also ask

What does <![ CDATA in XML mean?

A CDATA section is used to mark a section of an XML document, so that the XML parser interprets it only as character data, and not as markup. It comes handy when one XML data need to be embedded within another XML document.

How do I ignore CDATA?

There is no way to ignore the CDATA tag - it's part of the xml spec and parsers should honour it. If you don't like the idea of this answer to your earlier question, you could get the contents of the CDATA section and parse it as XML again. However, this is highly not recommended!

Can we use CDATA in XML attribute?

The spec says that the Attribute value must not have an open angle bracket. Open angle brackets and ampersand must be escaped. Therefore you cannot insert a CDATA section.


3 Answers

public static void main(String[] args) throws Exception {
  File file = new File("data.xml");
  DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
 //if you are using this code for blackberry xml parsing
  builder.setCoalescing(true);
  Document doc = builder.parse(file);

  NodeList nodes = doc.getElementsByTagName("topic");
  for (int i = 0; i < nodes.getLength(); i++) {
    Element element = (Element) nodes.item(i);
    NodeList title = element.getElementsByTagName("title");
    Element line = (Element) title.item(0);
    System.out.println("Title: " + getCharacterDataFromElement(line));
  }
}
public static String getCharacterDataFromElement(Element e) {
  Node child = e.getFirstChild();
  if (child instanceof CharacterData) {
    CharacterData cd = (CharacterData) child;
    return cd.getData();
  }
  return "";
}

( http://www.java2s.com/Code/Java/XML/GetcharacterdataCDATAfromxmldocument.htm )

like image 56
Thargor Avatar answered Oct 14 '22 02:10

Thargor


Since all previous answers are using a DOM based approach. This is how to parse CDATA with a stream based approach using STAX.

Use the following pattern:

  switch (EventType) {
        case XMLStreamConstants.CHARACTERS:
        case XMLStreamConstants.CDATA:
            System.out.println(r.getText());
            break;
        default:
            break;
        }

Complete sample:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void readCDATAFromXMLUsingStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.CHARACTERS:
            case XMLStreamConstants.CDATA:
                System.out.println(r.getText());
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

With /path/toYour/sample/file.xml

 <data>
    <![CDATA[ Sat Nov 19 18:50:15 2016 (1672822)]]>
    <![CDATA[Sat, 19 Nov 2016 18:50:14 -0800 (PST)]]>
 </data>

Gives:

 Sat Nov 19 18:50:15 2016 (1672822)                             
 Sat, 19 Nov 2016 18:50:14 -0800 (PST)       
like image 24
jschnasse Avatar answered Oct 14 '22 03:10

jschnasse


CDATA just says that the included data should not be escaped. So, just take the tag text. XML parser should return the clear data without CDATA.

like image 44
AlexR Avatar answered Oct 14 '22 03:10

AlexR