Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Parse XML's Media:Content with PHP?

Tags:

php

xml

rss

I've found a great tutorial on how to accomplish most of the work at:

https://www.developphp.com/video/PHP/simpleXML-Tutorial-Learn-to-Parse-XML-Files-and-RSS-Feeds

but I can't understand how to extract media:content images from the feeds. I've read as much info as i can find, but i'm still stuck.

ie: How to get media:content with SimpleXML this suggests using:

foreach ($xml->channel->item as $news){
    $ns_media = $news->children('http://search.yahoo.com/mrss/');
    echo $ns_media->content; // displays "<media:content>"}

but i can't get it to work.

Here's my script and feed i'm trying to parse:

<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
    $title = $xml->channel->item[$i]->title;
    $link = $xml->channel->item[$i]->link;
    $description = $xml->channel->item[$i]->description;
    $pubDate = $xml->channel->item[$i]->pubDate;

    $html .= "<a href='$link'><h3>$title</h3></a>";
    $html .= "$description";
    $html .= "<br />$pubDate<hr />";
}
echo $html;
?>

I don't know where to add this code into the script to make it work. Honestly, i've browsed for hours, but couldn't find working script that would parse media:content.

Can someone help with this?

========================

UPDATE:

Thanx to fusion3k, i got the final code working:

<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 5; $i++){

    $image = $xml->channel->item[$i]->children('media', True)->content->attributes();
    $title = $xml->channel->item[$i]->title;
    $link = $xml->channel->item[$i]->link;
    $description = $xml->channel->item[$i]->description;
    $pubDate = $xml->channel->item[$i]->pubDate;

    $html .= "<img src='$image' alt='$title'>";
    $html .= "<a href='$link'><h3>$title</h3></a>";
    $html .= "$description";
    $html .= "<br />$pubDate<hr />";
}
echo $html;
?>

Basically all i needed was this simple line:

$image = $xml->channel->item[$i]->children('media', True)->content->attributes();

Can't believe it was so hard for non techie to find this info online after reading dozens of posts and articles. Well, hope this will serve well for other folks like me :)

like image 949
reizer Avatar asked Mar 08 '16 20:03

reizer


2 Answers

To get 'url' attribute, use ->attribute() syntax:

$ns_media = $news->children('http://search.yahoo.com/mrss/');

/* Echoes 'url' attribute: */
echo $ns_media->content->attributes()['url'];
// in php < 5.5: $attr = $ns_media->content->attributes(); echo $attr['url'];

/* Catches 'url' attribute: */
$url = $ns_media->content->attributes()['url']->__toString();
// in php < 5.5: $attr = $ns_media->content->attributes(); $url = $attr['url']->__toString();

Namespaces explanation:

The ->children() arguments is not the URL of your XML, it is a Namespace URI.

XML namespaces are used for providing uniquely named elements and attributes in an XML document:

<xxx>       Standard XML tag
<yyy:zzz>   Namespaced tag
 └┬┘ └┬┘
  │   └──── Element Name
  └──────── Element Prefix (Namespace Identifier)

So, in your case, <media:content> is the “content” element of Namespace “media”. Namespaced elements must be have an associated Namespace URI, as attribute of a parent node or — most commonly — of the root element: this attribute has the form xmlns:yyy="NamespaceURI" (in your case xmlns:media="http://search.yahoo.com/mrss/" as attribute of root node <rss>).

Ultimately, the above $news->children( 'http://search.yahoo.com/mrss/' ) means “retrieve all children elements with http://search.yahoo.com/mrss/ as Namespace URI; an alternative — most intelligible — syntax is: $news->children( 'media', True ) (True means “regarded as a prefix”).

Returning to the code in example, the generic syntax to retrieve all first item's children with prefix media is:

$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'http://search.yahoo.com/mrss/' );

or (identical result):

$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'media', True );

Your new code:

If you want to show the <media:content url> thumbnail for each element in your page, modify the original code in this way:

(...)
$pubDate = $xml->channel->item[$i]->pubDate;
$image   = $xml->channel->item[$i]->children( 'media', True )->content->attributes()['url'];
// in php < 5.5:
// $attr  = $xml->channel->item[$i]->children( 'media', True )->content->attributes();
// $image = $attr['url'];

$html   .= "<a href='$link'><h3>$title</h3></a>";
$html   .= "<img src='$image' alt='$title'>";
(...)
like image 118
fusion3k Avatar answered Sep 20 '22 21:09

fusion3k


Simple example for newbs like me:

$url = "https://www.youtube.com/feeds/videos.xml?channel_id=UCwNPPl_oX8oUtKVMLxL13jg";
$rss = simplexml_load_file($url);

foreach($rss->entry as $item) {

  $time = $item->published;
  $time = date('Y-m-d \ H:i', strtotime($time));

  $media_group = $item->children( 'media', true );
  $title = $media_group->group->title;
  $description = $media_group->group->description;
  $views = $media_group->group->community->statistics->attributes()['views'];
}
echo $time . ' :: ' . $title . '<br>' . $description . '<br>' . $views . '<br>';
like image 38
Pastuh Avatar answered Sep 19 '22 21:09

Pastuh