How to parse a XML having data included in <![CDATA[---]...
how can we parse the xml and get the data included in CDATA
???
A CDATA section is used to mark a section of an XML document, so that the XML parser interprets it only as character data, and not as markup. It comes handy when one XML data need to be embedded within another XML document.
There is no way to ignore the CDATA tag - it's part of the xml spec and parsers should honour it. If you don't like the idea of this answer to your earlier question, you could get the contents of the CDATA section and parse it as XML again. However, this is highly not recommended!
The spec says that the Attribute value must not have an open angle bracket. Open angle brackets and ampersand must be escaped. Therefore you cannot insert a CDATA section.
public static void main(String[] args) throws Exception {
File file = new File("data.xml");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
//if you are using this code for blackberry xml parsing
builder.setCoalescing(true);
Document doc = builder.parse(file);
NodeList nodes = doc.getElementsByTagName("topic");
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("title");
Element line = (Element) title.item(0);
System.out.println("Title: " + getCharacterDataFromElement(line));
}
}
public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "";
}
( http://www.java2s.com/Code/Java/XML/GetcharacterdataCDATAfromxmldocument.htm )
Since all previous answers are using a DOM based approach. This is how to parse CDATA with a stream based approach using STAX.
Use the following pattern:
switch (EventType) {
case XMLStreamConstants.CHARACTERS:
case XMLStreamConstants.CDATA:
System.out.println(r.getText());
break;
default:
break;
}
Complete sample:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public void readCDATAFromXMLUsingStax() {
String yourSampleFile = "/path/toYour/sample/file.xml";
XMLStreamReader r = null;
try (InputStream in =
new BufferedInputStream(new FileInputStream(yourSampleFile));) {
XMLInputFactory factory = XMLInputFactory.newInstance();
r = factory.createXMLStreamReader(in);
while (r.hasNext()) {
switch (r.getEventType()) {
case XMLStreamConstants.CHARACTERS:
case XMLStreamConstants.CDATA:
System.out.println(r.getText());
break;
default:
break;
}
r.next();
}
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
if (r != null) {
try {
r.close();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
}
With /path/toYour/sample/file.xml
<data>
<![CDATA[ Sat Nov 19 18:50:15 2016 (1672822)]]>
<![CDATA[Sat, 19 Nov 2016 18:50:14 -0800 (PST)]]>
</data>
Gives:
Sat Nov 19 18:50:15 2016 (1672822)
Sat, 19 Nov 2016 18:50:14 -0800 (PST)
CDATA
just says that the included data should not be escaped. So, just take the tag text. XML parser should return the clear data without CDATA
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With