Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrape multiple linked HTML tables in R and rvest

This article http://www.ajnr.org/content/30/7/1402.full contains four links to html-tables which I would like to scrape with rvest.

With help of the css selector:

"#T1 a" 

it's possible to get to the first table like this:

library("rvest")
html_session("http://www.ajnr.org/content/30/7/1402.full") %>%
follow_link(css="#T1 a") %>%
html_table() %>%
View()

The css-selector:

".table-inline li:nth-child(1) a"

makes it possible to select all four html-nodes containing the tags linking to the four tables:

library("rvest")
html("http://www.ajnr.org/content/30/7/1402.full") %>%
html_nodes(css=".table-inline li:nth-child(1) a")

How would it be possible to loop through this list and retrieve all four tables in one go? What's the best approach?

like image 868
landge Avatar asked Feb 25 '15 21:02

landge


1 Answers

Here's one approach:

library(rvest)

url <- "http://www.ajnr.org/content/30/7/1402.full"
page <- read_html(url)

# First find all the urls
table_urls <- page %>% 
  html_nodes(".table-inline li:nth-child(1) a") %>%
  html_attr("href") %>%
  xml2::url_absolute(url)

# Then loop over the urls, downloading & extracting the table
lapply(table_urls, . %>% read_html() %>% html_table())
like image 122
hadley Avatar answered Oct 25 '22 13:10

hadley