Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R, Right xpath expression when using XML and xpathSApply

Tags:

r

xml

xml-parsing

Let's say I parsed an website using below expression

library(XML)
url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/", useInternalNodes = T)

if I run below code,

xpathSApply(url.df_1, "//div[@class='app_content_section']/h3", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

I will get below -

[1] "Description"                      "What's new"                      
[3] "Permissions"                      "More Apps by King.com All Apps  »"
[5] "Customers Also Viewed"            "Customers Also Installed"       

Now, what I'm interested in is only "Customers Also Installed" part. But when I run the below code,

xpathSApply(url.df_1, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

it spits out the all the apps included in "More Apps by King.com All Apps" , "Customers Also Viewed" and "Customers Also Installed".

So I tried,

xpathSApply(url.df_1, "//div[h3='Customers Also Installed']”, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

but this didn't work. So I tried

xpathSApply(url.df_1, "//div[contains(.,'Customers Also Installed')]",xmlValue)

but this doesn't work either. (The output should be something like below-)

 [,1]                                                
[1,] "Christmas Candy Free\n    Daniel Development\n    "
[2,] "/app/android/xmas.candy.free/"                     
 [,2]                                           
[1,] "Jewel Candy Maker\n    Nutty Apps\n    "      
[2,] "/app/android/com.candy.maker.jewel.nuttyapps/"
 [,3]                                      
[1,] "Pogz 2\n    Terry Paton\n    "           
[2,] "/app/android/com.terrypaton.unity.pogz2/"

Any guidance will be much appreciated!

like image 922
user1486507 Avatar asked Apr 04 '13 08:04

user1486507


1 Answers

Here is one option (you was really close):

xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href')

[1] "/app/android/xmas.candy.free/"                
[2] "/app/android/com.candy.maker.jewel.nuttyapps/"
[3] "/app/android/com.terrypaton.unity.pogz2/"  
like image 111
agstudy Avatar answered Oct 03 '22 10:10

agstudy