Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML CDATA with PHP [closed]

Tags:

php

xml

rss

I have a little problem that I can't figure out how to solve. I have an XML (actually it's RSS) file that I'm trying to parse with PHP, but the CDATA tag come out blank.

Here's the XML Code and here's the PHP file

Everything works fine, except that the description tag is not printing. I would be very grateful if some one could help.

like image 635
Helen Neely Avatar asked Aug 07 '09 20:08

Helen Neely


1 Answers

Just out of curiosity, after getting your XML (I hope I didnt't destroy it in the process -- I'll see if I can edit the OP to correct it) :

  • did you cast the description to a string ?


What I mean is you could use this :

$xml = simplexml_load_string($str);
foreach ($xml->channel->item as $item) {
    var_dump($item->description);
}

But it would only get you that :

object(SimpleXMLElement)[5]
object(SimpleXMLElement)[3]

Which is not that nice...


You need to cast the data to string, like this :

$xml = simplexml_load_string($str);
foreach ($xml->channel->item as $item) {
    var_dump((string)$item->description);
}

And you get the descriptions :

string '

This is one of the content that I need printed on the screen, but nothing is happening. Please, please...output something... <br /><br /> <b>Showing</b>: 2 weeks<br /> <b>Starting On</b>: August 7, 2009 <br /> <b>Posted On</b>: August 7, 2009 <br />
<a href="http://www.mysite.com">click to view</a> 
            ' (length=329)

string '

Another content...This is another of the content that I need printed on the screen, but nothing is happening. Please, please...output something... <br /><br /> <b>Showing</b>: 2 weeks<br /> Starting On: August 7, 2009 <br /> <b>Posted On</b>: August 7, 2009
; 
               ' (length=303)

(Using trim on those might prove useful, btw, if you XML is indented)


Else... Well, we'll probably need your php code (at least, would be useful to know how you are getting to the description tag ;-) )


EDIT

Thanks for the reformated XML !

If I go to pastebin, in the textarea at the bottom of the page, there is a white space at the beginning of the XML, before the <?xml version="1.0" encoding="utf-8"?>

If you have that one in your real XML data, it will be a source of problem : it is not valid XMl (the XML declaration has to be the first thing in the XML data).
You'll get errors like this one :

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : XML declaration allowed only at the start of the document

Can you check that ?
And, if the problem is here, you should activate error_reporting and display_errors ;-) That would help !


EDIT after taking a look at the PHP file :

In your for loop, you are doing this to get your description data :

$item_desc = $x->item($i)->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;

description doesn't contain any childNode, I'd say ; what about using it's nodeValue directly ?
Like this :

$item_desc = $x->item($i)->getElementsByTagName('description')->item(0)->nodeValue;

It seems to be working better this way :-)

As a sidenote, you could probably do the same for other tags, I suppose ; for instance, this seems to be working too :

$item_title=$x->item($i)->getElementsByTagName('title')->item(0)->nodeValue;
$item_link=$x->item($i)->getElementsByTagName('link')->item(0)->nodeValue;

What does this give you ?


Another EDIT : and here is the code I would probably use :

$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($str);         // I changed that because I have the XML data in a string

//get elements from "<channel>"
$channel = $xmlDoc->getElementsByTagName('channel')->item(0);
$channel_title = $channel->getElementsByTagName('title')->item(0)->nodeValue;
$channel_link = $channel->getElementsByTagName('link')->item(0)->nodeValue;
$channel_desc = $channel->getElementsByTagName('description')->item(0)->nodeValue;

//output elements from "<channel>"
echo "<p><a href='" . $channel_link . "'>" . $channel_title . "</a>";
echo "<br />";
echo $channel_desc . "</p>";

//get and output "<item>" elements
$x = $xmlDoc->getElementsByTagName('item');
for ($i=0 ; $i<=1 ; $i++) {
    $item_title = $x->item($i)->getElementsByTagName('title')->item(0)->nodeValue;
    $item_link = $x->item($i)->getElementsByTagName('link')->item(0)->nodeValue;
    $item_desc = $x->item($i)->getElementsByTagName('description')->item(0)->nodeValue;
    echo ("<p><a href='" . $item_link
    . "'>" . $item_title . "</a>");
    echo ("<br />");
    echo ($item_desc . "</p>");
    echo' <p />';
}

Note I have the XML data in a string, and I don't need to fetch it from an URL, so I'm using the loadXML method and not load.

The major difference is that I removed some childNodes accesses, that I feel were not necessary.
Does this seem OK to you ?

like image 124
Pascal MARTIN Avatar answered Oct 01 '22 14:10

Pascal MARTIN