I'm trying to parse CDATA tpyes in XML. The code runs fine and it will print Links: in the console (about 50 times, because that's how many links I have) but the links won't appear...it's just a blank console space. What could I be missing?``
package Parse;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XMLParse {
public static void main(String[] args) throws Exception {
File file = new File("c:test/returnfeed.xml");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(file);
NodeList nodes = doc.getElementsByTagName("video");
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("videoURL");
Element line = (Element) title.item(0);
System.out.println("Links: " + getCharacterDataFromElement(line));
}
}
public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "";
}
}
Result:
Links:
Links:
Links:
Links:
Links:
Links:
Links:
Sample XML: (Not full document)
<?xml version="1.0" ?>
<response xmlns:uma="http://websiteremoved.com/" version="1.0">
<timestamp>
<![CDATA[ July 18, 2012 5:52:33 PM PDT
]]>
</timestamp>
<resultsOffset>
<![CDATA[ 0
]]>
</resultsOffset>
<status>
<![CDATA[ success
]]>
</status>
<resultsLimit>
<![CDATA[ 207
]]>
</resultsLimit>
<resultsCount>
<![CDATA[ 207
]]>
</resultsCount>
<videoCollection>
<name>
<![CDATA[ Video API
]]>
</name>
<count>
<![CDATA[ 207
]]>
</count>
<description>
<![CDATA[
]]>
</description>
<videos>
<video>
<id>
<![CDATA[ 8177840
]]>
</id>
<headline>
<![CDATA[ Test1
]]>
</headline>
<shortHeadline>
<![CDATA[ Test2
]]>
</shortHeadline>
<description>
<![CDATA[ Test3
]]>
</description>
<shortDescription>
<![CDATA[ Test4
]]>
</shortDescription>
<posterImage>
<![CDATA[ http://a.com.com/media/motion/2012/0718/los_120718_los_bucher_on_howard.jpg
]]>
</posterImage>
<videoURL>
<![CDATA[ http://com/removed/2012/0718/los_120718_los_bucher_on_howard.mp4
]]>
</videoURL>
</video>
</videos>
</videoCollection>
</response>
Solution 1 Looks like the data are Base64 - Wikipedia[^] encoded which is common for binary data in XML documents. Then you have to pass the CDATA strings to a base64 decoder. See the Convert. FromBase64String Method (String) (System)[^].
A CDATA section is used to mark a section of an XML document, so that the XML parser interprets it only as character data, and not as markup. It comes handy when one XML data need to be embedded within another XML document.
No, The markup denoting a CDATA Section is not permitted as the value of an attribute.
Instead of checking the first child, it would be prudent whether the node has other children as well. In your case (and I guess if you had debugged that node, you would've known), the node passed to the method getCharacterDataFromElement
had multiple children. I updated the code and this one might give you the pointers to the right direction:
public static String getCharacterDataFromElement(Element e) {
NodeList list = e.getChildNodes();
String data;
for(int index = 0; index < list.getLength(); index++){
if(list.item(index) instanceof CharacterData){
CharacterData child = (CharacterData) list.item(index);
data = child.getData();
if(data != null && data.trim().length() > 0)
return child.getData();
}
}
return "";
}
I would consider using getTextContent()
String string = cdataNode.getTextContent();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With