I am trying to extract the values of all the values in all tabs from this page. http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm
I first tried downloading as excel. But that was not possible. I am just able to download it as text file. If I try reading directly from webpage I get the raw html page. I am stuck as how to extract these values. Please find the code which I tried till now.
library(RCurl)
require(XML)
url = "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
download.file(url = url, destfile = "E:\\indiaprecip")
Just use function "htmlTreeParse" from XML
library(XML)
html <- htmlTreeParse("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm",
useInternalNodes = T)
xpathSApply(html, "//meta/@name")
But in your case you have another problem. The data which you want to access is located in html frame. Code below can help you to read these data:
library(XML)
library(RCulr)
url <- "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
html <- htmlTreeParse(url, useInternalNodes = T)
frameUrl <- paste("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/",
xpathSApply(html, "//frame[1]/@src"),
sep = "")
htmlWithData = getURL(frameUrl,
httpheader = c("User-Agent" = "RCurl",
"Referer" = url))
dataXml <- htmlTreeParse(htmlWithData, isURL = F, useInternalNodes = T)
xpathSApply(dataXml, "//body/table")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With