Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading CDATA XML in Java

Tags:

java

parsing

xml

I'm trying to parse CDATA tpyes in XML. The code runs fine and it will print Links: in the console (about 50 times, because that's how many links I have) but the links won't appear...it's just a blank console space. What could I be missing?``

package Parse;

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class XMLParse {
  public static void main(String[] args) throws Exception {
    File file = new File("c:test/returnfeed.xml");
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.parse(file);

    NodeList nodes = doc.getElementsByTagName("video");
    for (int i = 0; i < nodes.getLength(); i++) {
      Element element = (Element) nodes.item(i);
      NodeList title = element.getElementsByTagName("videoURL");
      Element line = (Element) title.item(0);
      System.out.println("Links: " + getCharacterDataFromElement(line));
    }
  }
  public static String getCharacterDataFromElement(Element e) {
    Node child = e.getFirstChild();
    if (child instanceof CharacterData) {
      CharacterData cd = (CharacterData) child;
      return cd.getData();
    }
    return "";
  }
}

Result:

Links: 

Links: 

Links: 

Links: 

Links: 

Links: 

Links: 

Sample XML: (Not full document)

<?xml version="1.0" ?> 
<response xmlns:uma="http://websiteremoved.com/" version="1.0">

    <timestamp>
        <![CDATA[  July 18, 2012 5:52:33 PM PDT 
          ]]> 
    </timestamp>
    <resultsOffset>
        <![CDATA[  0 
          ]]> 
    </resultsOffset>
    <status>
        <![CDATA[  success 
        ]]> 
    </status>
    <resultsLimit>
        <![CDATA[  207 
        ]]> 
    </resultsLimit>
    <resultsCount>
        <![CDATA[  207 
        ]]> 
    </resultsCount>
    <videoCollection>
        <name>
            <![CDATA[  Video API 
            ]]> 
        </name>
        <count>
            <![CDATA[  207 
            ]]> 
        </count>
        <description>
            <![CDATA[  
            ]]> 
        </description>
        <videos>
            <video>
                <id>
                    <![CDATA[  8177840 
                    ]]> 
                </id>
                <headline>
                    <![CDATA[  Test1
                    ]]> 
                </headline>
                <shortHeadline>
                    <![CDATA[  Test2
                    ]]> 
                </shortHeadline>
                <description>
                    <![CDATA[ Test3

                    ]]> 
                </description>
                <shortDescription>
                    <![CDATA[ Test4

                    ]]> 
                </shortDescription>
                <posterImage>
                    <![CDATA[ http://a.com.com/media/motion/2012/0718/los_120718_los_bucher_on_howard.jpg

                    ]]> 
                </posterImage>
                <videoURL>
                    <![CDATA[ http://com/removed/2012/0718/los_120718_los_bucher_on_howard.mp4

                    ]]> 
                </videoURL>
            </video>
        </videos>
    </videoCollection>
</response>
like image 472
Matt Avatar asked Jul 19 '12 03:07

Matt


People also ask

How do I decode a CDATA?

Solution 1 Looks like the data are Base64 - Wikipedia[^] encoded which is common for binary data in XML documents. Then you have to pass the CDATA strings to a base64 decoder. See the Convert. FromBase64String Method (String) (System)[^].

What is CDATA in XML Java?

A CDATA section is used to mark a section of an XML document, so that the XML parser interprets it only as character data, and not as markup. It comes handy when one XML data need to be embedded within another XML document.

Can we use CDATA in XML attribute?

No, The markup denoting a CDATA Section is not permitted as the value of an attribute.


2 Answers

Instead of checking the first child, it would be prudent whether the node has other children as well. In your case (and I guess if you had debugged that node, you would've known), the node passed to the method getCharacterDataFromElement had multiple children. I updated the code and this one might give you the pointers to the right direction:

public static String getCharacterDataFromElement(Element e) {

    NodeList list = e.getChildNodes();
    String data;

    for(int index = 0; index < list.getLength(); index++){
        if(list.item(index) instanceof CharacterData){
            CharacterData child = (CharacterData) list.item(index);
            data = child.getData();

            if(data != null && data.trim().length() > 0)
                return child.getData();
        }
    }
    return "";
}
like image 61
Sujay Avatar answered Oct 05 '22 21:10

Sujay


I would consider using getTextContent()

String string = cdataNode.getTextContent();
like image 22
armagedescu Avatar answered Oct 05 '22 20:10

armagedescu