extract a specific table from wikipedia in R

Question

I want to extract the 20th table from a Wikipedia page https://en.wikipedia.org/wiki/...

I now use this code, but it only extracts the first heading table.

the_url <- "https://en.wikipedia.org/wiki/..."
tb <- the_url %>% read_html() %>% 
  html_node("table") %>% 
  html_table(fill = TRUE)

What should I do to get the specific one? Thank you!!

QHarr · Accepted Answer

Instead of indexing where table position could move, you could anchor according to relationship to element with id prize_money. Return just a single node for efficiency. Avoid longer xpaths as they can be fragile.

library(rvest)

table <- read_html('https://en.wikipedia.org/wiki/2018_FIFA_World_Cup#Prize_money') %>% 
  html_node(xpath = "//*[@id='Prize_money']/parent::h4/following-sibling::table[1]") %>% 
  html_table(fill = T)

DPH · Answer

since you have a specific table you want to scrape you can identify in in the html_node() call by using the xpath of the webpage element:

library(dplyr)
library(rvest)

the_url <- "https://en.wikipedia.org/wiki/2018_FIFA_World_Cup"

the_url %>%
  read_html() %>% 
  html_nodes(xpath='/html/body/div[3]/div[3]/div[5]/div[1]/table[20]') %>% 
  html_table(fill=TRUE)

extract a specific table from wikipedia in R

Tags:

r

Drick

2 Answers

QHarr

DPH

Recent Activity

Donate For Us

extract a specific table from wikipedia in R

Tags:

r

Drick

2 Answers

QHarr

DPH

Related questions

Recent Activity

Donate For Us