Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape a page in R using rvest or RCurl or httr

Tags:

rcurl

httr

rvest

I would like to extract the table in the below page

https://www.mcxindia.com/market-data/spot-market-price

I have tried rvest and RCurl but in both the cases, the page which gets downloaded is different from what I see in the browser. I am assuming there is some form of redirection which I am unable to detect or follow

Any help would be appreciated

PS: Not interested in phantomjs

This is what I have tried till now:

1. HTTR

base_url <- "https://www.mcxindia.com/market-data/spot-market-price"
ua       <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
library(httr)
library(XML)
doc <- POST(base_url,user_agent(ua),set_cookies(`_ga` = "GA1.2.543290785.1505100652",`_gid`="GA1.2.1409943545.1505881384",`_gat`="1"))
doc <- htmlParse(doc)
poptable<-readHTMLTable(doc,which=7)

Result: No Data Found!!!!

2. RCurl

library(RCurl)
curl <- getCurlHandle()
curlSetOpt(curl = curl,
           ssl.verifypeer = FALSE,
           useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
           timeout = 60,
           followlocation = TRUE,
           cookiejar = "./cookies",
           cookiefile = "./cookies")
newDoc = getURL("https://www.mcxindia.com/market-data/spot-market-price", curl=curl)
newDoc <- htmlParse(newDoc)
poptable<-readHTMLTable(newDoc,which=7)

Result: No Data Found!!!!

Also I would be interested to know how to get the excel file (see the small excel icon)

like image 966
Sushanta Deb Avatar asked Jan 01 '26 18:01

Sushanta Deb


1 Answers

Here is the answer

library(rvest)
library(stringi)
library(V8)

  ctx <- v8()
  pg <- read_html("https://www.mcxindia.com/market-data/spot-market-price")
  html_nodes(pg, xpath=".//script[contains(., 'Data')]")[[1]] %>% 
    html_text() %>% stri_unescape_unicode() %>% stri_replace_all_fixed('\\\\', '')%>% 
    ctx$eval() -> ignore_the_blank_return_value
  data <- ctx$get("vSMP")$Data[,c("Symbol","TodaysSpotPrice","Unit")]

Enjoy!!!

like image 185
Sushanta Deb Avatar answered Jan 06 '26 20:01

Sushanta Deb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!