I want to scrape the match time and date from this url: http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary By using the chrome dev tools, I can see this appears to be generated using the following code: <pre class="prettyprint"><code><td colspan="3" id="utime" class="mstat-date">01:20 AM, October 29, 2014</td> </code></pre> But this is not in the source html. I think this is because its java (correct me if Im wrong). How can I scrape this information using R?

So, RSelenium is not the only answer (anymore). If you can install the PhantomJS binary (grab phantomjs binaries from here: http://phantomjs.org/) then you can use it to render the HTML and scrape it with <code>rvest</code> (similar to the RSelenium approach but doesn't require java): <pre class="prettyprint"><code>library(rvest) # render HTML from the site with phantomjs url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary" writeLines(sprintf("var page = require('webpage').create(); page.open('%s', function () { console.log(page.content); //page source phantom.exit(); });", url), con="scrape.js") system("phantomjs scrape.js > scrape.html", intern = T) # extract the content you need pg <- html("scrape.html") pg %>% html_nodes("#utime") %>% html_text() ## [1] "10:20 AM, October 28, 2014" </code></pre>

Scraping javascript website in R

Tags:

javascript

r

screen-scraping

rvest

I want to scrape the match time and date from this url:

http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary

By using the chrome dev tools, I can see this appears to be generated using the following code:

Click to copy

<td colspan="3" id="utime" class="mstat-date">01:20 AM, October 29, 2014</td>

But this is not in the source html.

I think this is because its java (correct me if Im wrong). How can I scrape this information using R?

807

asked Oct 29 '14 13:10

Liam Flynn

2 Answers

So, RSelenium is not the only answer (anymore). If you can install the PhantomJS binary (grab phantomjs binaries from here: http://phantomjs.org/) then you can use it to render the HTML and scrape it with rvest (similar to the RSelenium approach but doesn't require java):

Click to copy

library(rvest)

# render HTML from the site with phantomjs

url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"

writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
    console.log(page.content); //page source
    phantom.exit();
});", url), con="scrape.js")

system("phantomjs scrape.js > scrape.html", intern = T)

# extract the content you need
pg <- html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()

## [1] "10:20 AM, October 28, 2014"

answered Oct 03 '22 23:10

hrbrmstr

You could also use docker as the web driver (in place of selenium)

You will still need to install phantomjs, and docker too. Then run:

Click to copy

library(RSelenium)

url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"

system('docker run -d -p 4445:4444 selenium/standalone-chrome') 
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "chrome")
remDr$open()
remDr$navigate(url)

writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
    console.log(page.content); //page source
    phantom.exit();
});", url), con="scrape.js")

system("phantomjs scrape.js > scrape.html", intern = T)

# extract the content you need
pg <- read_html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()

# [1] "10:20 AM, October 28, 2014"

answered Oct 04 '22 00:10

stevec

Related questions
                            
                                How do I get the actual values for left/right/top/bottom of an absolutely positioned element?
                            
                                How to pass a javascript array to a python script using flask [using flask example]
                            
                                how to access immediate unknown key in object
                            
                                ThreeJS Android performance
                            
                                Increase hover area of SVG element
                            
                                Mongoose behavior and schema
                            
                                element.replaceWith in a custom directive's link only work at first time called
                            
                                Show Pinterest hover buttons only on images with specified class
                            
                                Can I edit React components without reloading the browser?
                            
                                how to tell if two javascript instances are of the same class type?
                            
                                Changing properties of SVG circle in HTML5/JS
                            
                                Angularjs - ng-click not firing
                            
                                Bootstrap table not working correctly on Chrome but works fine on Firefox
                            
                                Selenium Python bindings: how to execute JavaScript on an element?
                            
                                Owlcarousel 2 dynamically loaded content
                            
                                D3 fill shape with image using pattern
                            
                                Get second level domain name from URL
                            
                                Using an ng-option dropdown in a ui-grid editableCellTemplate [ng-grid 3.x]
                            
                                Call function dynamically in Javascript
                            
                                AngularJS and Webpack Integration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scraping javascript website in R

Tags:

javascript

r

screen-scraping

rvest

Liam Flynn

People also ask

2 Answers

hrbrmstr

stevec

Recent Activity

Donate For Us