Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SimpleXML to read RSS feed

I am using PHP and simpleXML to read the following rss feed:

http://feeds.bbci.co.uk/news/england/rss.xml

I can get most of the information I want like so:

$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');

echo '<h1>'. $rss->channel->title . '</h1>';

foreach ($rss->channel->item as $item) {
   echo '<h2><a href="'. $item->link .'">' . $item->title . "</a></h2>";
   echo "<p>" . $item->pubDate . "</p>";
   echo "<p>" . $item->description . "</p>";
} 

But how would I output the thumbnail image that is in the following tag:

<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/51078000/jpg/_51078953_226alanpotbury.jpg"/>  
like image 547
geoffs3310 Avatar asked Feb 03 '11 14:02

geoffs3310


2 Answers

As you already know, SimpleXML lets you select an node's child using the object property operator -> or a node's attribute using the array access ['name']. It's great, but the operation only works if what you select belongs to the same namespace.

If you want to "hop" from a namespace to another, you can use the children() or attributes() methods. In your case, this is made a bit trickier because you have <item/> in the global namespace, the node you're looking for is in the "media" namespace* and then the attributes are in the global namespace again (they are not prefixed.) So using the normal object/array notation you'll have to "hop" twice:

foreach ($rss->channel->item as $item)
{
    // we load the attributes into $thumbAttr
    // you can either use the namespace prefix
    $thumbAttr = $item->children('media', true)->thumbnail->attributes();

    // or preferably the namespace name, read note below for an explanation
    $thumbAttr = $item->children('http://search.yahoo.com/mrss/')->thumbnail->attributes();

    echo $thumbAttr['url'];
}

*Note

I refer to the namespace as the "media" namespace but that's not really correct. The namespace name is http://search.yahoo.com/mrss/, and "media" is just a prefix, some sort of alias if you will. What's important to keep in mind is that http://search.yahoo.com/mrss/ is the real name of the namespace. At some point, your RSS provider might decide to change the prefix to, say, "yahoo" and your script will stop working if your script refers to the "media" prefix. However, if you use the namespace name, it will keep working no matter the prefix.

like image 102
Josh Davis Avatar answered Oct 13 '22 02:10

Josh Davis


SimpleXML is pretty bad at handling namespaces. You have two choices: The simplest hack is to simply read the contents of the feed into a string and replace the namespaces;

$feed = file_get_contents('http://feeds.bbci.co.uk/news/england/rss.xml');
$feed = str_replace('<media:', '<', $feed);

$rss = simplexml_load_string($feed);
...

Now you can access the element thumbnail directly.

The more elegant (not really) method is to find out what URI the namespace uses. If you look at the source code for http://feeds.bbci.co.uk/news/england/rss.xml you see that it points to http://search.yahoo.com/mrss/.

Now you can use this URI in the children() method of a SimpleXMLElement to get the contents of the media:thumbnail element;

$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');

foreach ($rss->channel->item as $item) {
    $media = $item->children('http://search.yahoo.com/mrss/');
    ...
}
like image 32
Björn Avatar answered Oct 13 '22 02:10

Björn