Let's say I parsed an website using below expression
library(XML)
url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/", useInternalNodes = T)
if I run below code,
xpathSApply(url.df_1, "//div[@class='app_content_section']/h3", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
I will get below -
[1] "Description" "What's new"
[3] "Permissions" "More Apps by King.com All Apps »"
[5] "Customers Also Viewed" "Customers Also Installed"
Now, what I'm interested in is only "Customers Also Installed" part. But when I run the below code,
xpathSApply(url.df_1, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
it spits out the all the apps included in "More Apps by King.com All Apps" , "Customers Also Viewed" and "Customers Also Installed".
So I tried,
xpathSApply(url.df_1, "//div[h3='Customers Also Installed']”, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
but this didn't work. So I tried
xpathSApply(url.df_1, "//div[contains(.,'Customers Also Installed')]",xmlValue)
but this doesn't work either. (The output should be something like below-)
[,1]
[1,] "Christmas Candy Free\n Daniel Development\n "
[2,] "/app/android/xmas.candy.free/"
[,2]
[1,] "Jewel Candy Maker\n Nutty Apps\n "
[2,] "/app/android/com.candy.maker.jewel.nuttyapps/"
[,3]
[1,] "Pogz 2\n Terry Paton\n "
[2,] "/app/android/com.terrypaton.unity.pogz2/"
Any guidance will be much appreciated!
Here is one option (you was really close):
xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href')
[1] "/app/android/xmas.candy.free/"
[2] "/app/android/com.candy.maker.jewel.nuttyapps/"
[3] "/app/android/com.terrypaton.unity.pogz2/"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With