Parse RSS feed using XML packagin R

Tags:

I am trying to scrape and parse the following RSS feed http://www.huffingtonpost.com/rss/liveblog/liveblog-1213.xml I have looked at other queries with respect to R and XML and have been unable to make any progress on my problem. The xml code for each entry

        <item>
     <title><![CDATA[Five Rockets Intercepted By Iron Drone Systems Over Be'er Sheva]]></title>
     <link>http://www.huffingtonpost.co.uk/2012/11/15/tel-aviv-gaza-rocket_n_2138159.html#2_five-rockets-intercepted-by-iron-drone-systems-over-beer-sheva</link>
     <description><![CDATA[<a href="http://www.haaretz.com/news/diplomacy-defense/live-blog-rockets-strike-tel-aviv-area-three-israelis-killed-in-attack-on-south-1.477960" target="_hplink">Haaretz reports</a> that five more rockets intercepted by Iron Dome systems over Be'er Sheva. In total, there have been 274 rockets fired and 105 intercepted. The IDF has attacked 250 targets in Gaza.]]></description>
     <guid>http://www.huffingtonpost.co.uk/2012/11/15/tel-aviv-gaza-rocket_n_2138159.html#2_five-rockets-intercepted-by-iron-drone-systems-over-beer-sheva</guid>
     <pubDate>2012-11-15T12:56:09-05:00</pubDate>
     <source url="http://huffingtonpost.com/rss/liveblog/liveblog-1213.xml">Huffingtonpost.com</source>
  </item>

For each entry/post I want to record "Date" (pubDate), "Title" (title), "Description" (full text cleaned). I have tried to use the xml package in R, but confess I am a bit of a newbie (little to no experience working with XML, but some R experience). The code I am working off of, and getting nowhere with is:

 library(XML)

 xml.url <- "http://www.huffingtonpost.com/rss/liveblog/liveblog-1213.xml"

 # Use the xmlTreePares-function to parse xml file directly from the web

 xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node

xmltop = xmlRoot(xmlfile)

xmlName(xmltop)

names( xmltop[[ 1 ]] )

  title          link   description      language     copyright 
  "title"        "link" "description"    "language"   "copyright" 
 category     generator          docs          item          item 
  "category"   "generator"        "docs"        "item"        "item"

However, whenever I try to manipulate and try to manipulate the "title", or "description" information, I continually get errors. Any help troubleshooting this code, would be most appreciated.

Thanks, Thomas

400

asked Nov 20 '12 06:11

Thomas

1 Answers

I am using the excellent Rcurl library and xpathSApply

This is script gives you 3 lists (title,pubdates and description)

library(RCurl)
library(XML)
xml.url <- "http://www.huffingtonpost.com/rss/liveblog/liveblog-1213.xml"
script  <- getURL(xml.url)
doc     <- xmlParse(script)
titles    <- xpathSApply(doc,'//item/title',xmlValue)
descriptions    <- xpathSApply(doc,'//item/description',xmlValue)
pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue)

168

answered Oct 06 '22 10:10

agstudy

Related questions
                            
                                Storing test files in the test project
                            
                                Visual Studio Images with Ribbons (XML)
                            
                                In XSL: How to avoid choose-blocks for wrapping elements?
                            
                                Can JAXB marshal by containment at first then marshal by @XmlIDREF for subsequent references?
                            
                                Groovy: append an XML Node to an existing XML document
                            
                                How to set z index by using some integer values
                            
                                Using Linq To XML, method to get path to all leaves?
                            
                                XPath to select only child elements (not blank text nodes)
                            
                                how to display human readable xml in an asp.net mvc view page
                            
                                Resize bitmap inside a drawable layer-list
                            
                                AS3 Delete child node from XML by child value
                            
                                Sort XML nodes in alphabetical order using XSL
                            
                                Importing data from XML file to SQL database
                            
                                how does the <list> tag work in spring
                            
                                How can I generate XML?
                            
                                How can I get the principal image from MediaWiki API?
                            
                                Python , XML AttributeError: 'NodeList' object has no attribute 'firstChild'
                            
                                Get all ancestors of current node
                            
                                How to avoid encoding of <,>,& with Document.createTextNode
                            
                                Reading xml file using REXML, says <UNDEFINED> ... </>

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parse RSS feed using XML packagin R

Tags:

r

xml

xml-parsing

Thomas

People also ask

1 Answers

agstudy

Recent Activity

Donate For Us