I'm scraping data from https://rotogrinders.com/lineups/nfl?site=draftkings. Currently, I use myData <- read_html("https://rotogrinders.com/lineups/nfl?site=draftkings")
to bring in data, then extract the data I want using html_nodes
. I'm trying to change the slate selection menu, then grab the data. The XPath for the menu I'm trying to change is //select[@name='slate_name']
.
My research leads me to believe I need to implement one of the following functions, but I'm unsure how to go about doing it, as the menu is not in a form and there is no submit button... the page automatically reloads once a new option is selected:
httr::post
rvest::html_session
Rselenium
I'm not familiar with the Rselenium
library, so ideally I'm looking for a solution using httr
or rvest
.
Overview. rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser.
The commonly used web Scraping tools for R is rvest. Install the package rvest in your R Studio using the following code. Having, knowledge of HTML and CSS will be an added advantage. It's observed that most of the Data Scientists are not very familiar with technical knowledge of HTML and CSS.
You allready got all the information with via read_html()
. The slate-name drop-down just filter the schedules via java-script. I would suggest grab all the data and filter by yourself. Hope that helps.
library(magrittr)
library(rvest)
#> Lade nötiges Paket: xml2
url <- "https://rotogrinders.com/lineups/nfl?site=draftkings"
myData <- read_html(url)
myData %>%
html_nodes(".teams") %>%
html_text() %>%
stringr::str_squish()
#> [1] "New York NYJ Jets Cleveland CLE Browns"
#> [2] "New Orleans NOS Saints Atlanta ATL Falcons"
#> [3] "Buffalo BUF Bills Minnesota MIN Vikings"
#> [4] "Denver DEN Broncos Baltimore BAL Ravens"
#> [5] "Indianapolis IND Colts Philadelphia PHI Eagles"
#> [6] "Cincinnati CIN Bengals Carolina CAR Panthers"
#> [7] "San Francisco SFO 49ers Kansas City KCC Chiefs"
#> [8] "Green Bay GBP Packers Washington WAS Redskins"
#> [9] "Oakland OAK Raiders Miami MIA Dolphins"
#> [10] "New York NYG Giants Houston HOU Texans"
#> [11] "Tennessee TEN Titans Jacksonville JAC Jaguars"
#> [12] "Los Angeles LAC Chargers Los Angeles LAR Rams"
#> [13] "Chicago CHI Bears Arizona ARI Cardinals"
#> [14] "Dallas DAL Cowboys Seattle SEA Seahawks"
#> [15] "New England NEP Patriots Detroit DET Lions"
#> [16] "Pittsburgh PIT Steelers Tampa Bay TBB Buccaneers"
Created on 2018-09-22 by the reprex package (v0.2.1)
EDIT
You still got all relevant information via read_html()
. You need to get the id from the drop-down and then parse the java-script string with all the salaries. I did the first part, the rest is up to you ;-)
library(tidyverse, quietly = TRUE)
library(rvest, warn.conflicts = FALSE)
#> Lade nötiges Paket: xml2
url <- "https://rotogrinders.com/lineups/nfl?site=draftkings"
raw <- read_html(url)
# helper function
parse_json <- function(x) tibble(name = x$name, importID = x$importId)
# get id from slates
raw %>%
html_nodes(".slate-data") %>%
html_attr(name = "value") %>%
jsonlite::fromJSON() %>%
purrr::map_df(parse_json)
#> # A tibble: 10 x 2
#> name importID
#> <chr> <chr>
#> 1 1:00pm: Classic: 13 Games 21505
#> 2 8:20pm: Classic (Thu-Mon): 16 Games 21576
#> 3 1:00pm: Classic (Sun-Mon): 15 Games 21586
#> 4 1:00pm: Tiers (NFL Tiers): 14 Games 21589
#> 5 1:00pm: Classic (Early Only): 10 Games 21581
#> 6 4:05pm: Classic (Afternoon Only): 3 Games 21630
#> 7 4:25pm: Classic (Afternoon Turbo): 2 Games 21631
#> 8 8:20pm: Classic (Primetime): 2 Games 21645
#> 9 4:25pm: Showdown Captain Mode (DAL vs SEA): 1 Games 21632
#> 10 8:20pm: Showdown Captain Mode (NE vs DET): 1 Games 21644
raw %>%
html_nodes(".select") %>%
html_nodes("script") %>%
html_text() %>%
stringr::str_squish() %>%
substr(1, 1000)
#> [1] "window.slateSelect = window.createReactComponent(SlateSelectRadnor, { slates: {\"All Games\":{\"games\":[{\"scheduleId\":\"45755\",\"teamAwayId\":\"12\",\"teamHomeId\":\"3\"},{\"scheduleId\":\"45756\",\"teamAwayId\":\"23\",\"teamHomeId\":\"21\"},{\"scheduleId\":\"45757\",\"teamAwayId\":\"9\",\"teamHomeId\":\"8\"},{\"scheduleId\":\"45758\",\"teamAwayId\":\"25\",\"teamHomeId\":\"1\"},{\"scheduleId\":\"45759\",\"teamAwayId\":\"14\",\"teamHomeId\":\"19\"},{\"scheduleId\":\"45760\",\"teamAwayId\":\"2\",\"teamHomeId\":\"22\"},{\"scheduleId\":\"45761\",\"teamAwayId\":\"31\",\"teamHomeId\":\"26\"},{\"scheduleId\":\"45762\",\"teamAwayId\":\"7\",\"teamHomeId\":\"20\"},{\"scheduleId\":\"45763\",\"teamAwayId\":\"27\",\"teamHomeId\":\"10\"},{\"scheduleId\":\"45764\",\"teamAwayId\":\"18\",\"teamHomeId\":\"13\"},{\"scheduleId\":\"45765\",\"teamAwayId\":\"16\",\"teamHomeId\":\"15\"},{\"scheduleId\":\"45766\",\"teamAwayId\":\"28\",\"teamHomeId\":\"30\"},{\"scheduleId\":\"45767\",\"teamAwayId\":\"5\",\"teamHomeId\":\"29\"},{\"scheduleId\":\"45768\",\"teamAwayId\":\"17\",\"teamHomeId\":\"32\"},{\"scheduleId\":\"45769\",\"teamAwayId\":\"11\",\"teamHomeId\":\"6\"},{\"scheduleId\":\"45770\","
Created on 2018-09-23 by the reprex package (v0.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With