Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract data from raw html in R

I am trying to extract the values of all the values in all tabs from this page. http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm

I first tried downloading as excel. But that was not possible. I am just able to download it as text file. If I try reading directly from webpage I get the raw html page. I am stuck as how to extract these values. Please find the code which I tried till now.

library(RCurl)
require(XML)
url = "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
download.file(url = url, destfile = "E:\\indiaprecip")
like image 367
Arun Raja Avatar asked Oct 25 '25 01:10

Arun Raja


1 Answers

Just use function "htmlTreeParse" from XML

library(XML)
html <- htmlTreeParse("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm",
                     useInternalNodes = T)
xpathSApply(html, "//meta/@name")

But in your case you have another problem. The data which you want to access is located in html frame. Code below can help you to read these data:

library(XML)
library(RCulr)
url <- "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
html <- htmlTreeParse(url, useInternalNodes = T)
frameUrl <- paste("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/",
                  xpathSApply(html, "//frame[1]/@src"),
                  sep = "")

htmlWithData = getURL(frameUrl,
                      httpheader = c("User-Agent" = "RCurl",
                                     "Referer" = url))

dataXml <- htmlTreeParse(htmlWithData, isURL = F, useInternalNodes = T)
xpathSApply(dataXml, "//body/table")
like image 66
Santyago Avatar answered Oct 26 '25 15:10

Santyago



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!