I'm attempting to scrape data from http://www.footballoutsiders.com/stats/snapcounts, but I can't change the fields in the drop down boxes on the site ("team", "week", "position", and "year"). My attempt to scrape the table associated with team = "ALL", week= "1", pos = "All", and year= "2015" with rvest is below.
url <- "http://www.footballoutsiders.com/stats/snapcounts"
pgsession <- html_session(url)
pgform <-html_form(pgsession)[[3]]
filled_form <-set_values(pgform,
"team" = "ALL",
"week" = "1",
"pos" = "ALL",
"year" = "2015"
)
submit_form(session=pgsession,form=filled_form, POST=url)
y <- read_html("http://www.footballoutsiders.com/stats/snapcounts")
y <- y %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(header=TRUE)
This code returns the table associated the default variables in the dropdown box which are team = "ALL", week= "20", pos = "QB", and year= "2015" which is a data frame that only contains 11 observations. If it had actually changed the fields it would have returned a data frame with 1,695 observations.
You can capture the session produced when the form is submitted and use that session as input to html_nodes
:
d <- submit_form(session=pgsession, form=filled_form)
y <- d %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(header=TRUE)
dim(y)
#[1] 1695 11
Otherwise, if you use read_html(url)
you are reading the original page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With