I parsed an XML file using the following code and got the results as below:
url = htmlTreeParse("http://www.appannie.com/app/ios/candy-crush-saga/", useInternalNodes = T)
ItemList =getNodeSet(url, "//li/a/@title")
>ItemList
[[1]]
title
"Angry Birds Star Wars HD"
attr(,"class")
[1] "XMLAttributeValue"
[[2]]
title
"iShuffle Bowling 2"
attr(,"class")
[1] "XMLAttributeValue"
....
[[15]]
title
"Angry Birds Star Wars Free"
attr(,"class")
[1] "XMLAttributeValue"
attr(,"class")
[1] "XMLNodeSet"
My issue is I'd like to grab the names of the game by parsing it. So I tried this code (based on my experience dealing with xmlValue ) -
IL <- lapply(ItemList, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
But it ends up giving this error :
Error in UseMethod("xmlValue") : no applicable method for 'xmlValue' applied to an object of class "XMLAttributeValue"
I did extensive googling but cannot find the solution to deal with XMLAttributeValue. Can someone give me a hint and let me know the difference between xmlValue and xmlAttributeValue?
Thanks for the updated question and added example URL!
I think with the @title you are already into the attributes, that's why you could not parse the xmlValue. What about e.g.:
> htmlTreeParse("http://www.appannie.com/app/ios/candy-crush-saga/", useInternalNodes = TRUE)
> xpathSApply(url, "//li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
Update: to filter your results, you might try only xpathSApply the "Customers Also Bought" div:
> xpathSApply(url, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With