scrape multiple linked HTML tables in R and rvest

Question

This article http://www.ajnr.org/content/30/7/1402.full contains four links to html-tables which I would like to scrape with rvest.

With help of the css selector:

"#T1 a"

it's possible to get to the first table like this:

library("rvest")
html_session("http://www.ajnr.org/content/30/7/1402.full") %>%
follow_link(css="#T1 a") %>%
html_table() %>%
View()

The css-selector:

".table-inline li:nth-child(1) a"

makes it possible to select all four html-nodes containing the tags linking to the four tables:

library("rvest")
html("http://www.ajnr.org/content/30/7/1402.full") %>%
html_nodes(css=".table-inline li:nth-child(1) a")

How would it be possible to loop through this list and retrieve all four tables in one go? What's the best approach?

hadley · Accepted Answer

Here's one approach:

library(rvest)

url <- "http://www.ajnr.org/content/30/7/1402.full"
page <- read_html(url)

# First find all the urls
table_urls <- page %>% 
  html_nodes(".table-inline li:nth-child(1) a") %>%
  html_attr("href") %>%
  xml2::url_absolute(url)

# Then loop over the urls, downloading & extracting the table
lapply(table_urls, . %>% read_html() %>% html_table())

scrape multiple linked HTML tables in R and rvest

Tags:

r

web-scraping

rvest

landge

1 Answers

hadley

Recent Activity

Donate For Us

scrape multiple linked HTML tables in R and rvest

Tags:

r

web-scraping

rvest

landge

1 Answers

hadley

Related questions

Recent Activity

Donate For Us