Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteratively Attach a Value to All Records in a Data Frame Created by a Loop

Trying to scrape individual game stats over a players career from basketball-reference.com (which is working), but I want to add the players name to the resulting df corresponding with the individual game results. For example, the first loop would just repeat "Kareem Abdul-Jabbar" 86 times for the 86 rows generate by the scrape. I'm trying to get the next loop to add to existing column named "Player_Name" using the cbind fill method, but cbind is instead creating a new column with each loop. Any advice on how to get the players name into a single column would be much appreciated.

library(rvest)
library(dplyr)

# Create df of players to be scraped
#########################################################################
players = data.frame(player_name = c(rep("Kareem Abdul-Jabbar",each=20),
                                rep("Karl Malone",each=19)),
                     player_id = c(rep("abdulka01",each=20),
                                rep("malonka01",each=19)),
                     initial = c(rep("a",each=20),
                                 rep("m",each=19)),
                     year = c(seq(1970,1989,by=1),
                              seq(1986,2004,by=1)))

# Scrape data and stack in a df
#########################################################################
output <- data_frame()
for (i in 1:2){
  
  url <- paste0("https://www.basketball-reference.com/players/",
                players[i,3],"/",players[i,2],"/gamelog/",players[i,4])
  
  webpage <- read_html(url)
  
  temp <- webpage %>%
    html_nodes("#pgl_basic") %>%
    html_table()
  
  player_name=players[i,1]
  
  output <- cbind(bind_rows(output, temp),player_name)
}
like image 506
P5C768 Avatar asked Oct 24 '25 02:10

P5C768


2 Answers

You can create URL's to scrape and use map_df to combine them into one dataframe.

library(rvest)
library(tidyverse)

urls <- sprintf("https://www.basketball-reference.com/players/%s/%s/gamelog/%s", 
        players$initial, players$player_id, players$year)

result <- map_df(urls, ~.x %>% 
                  read_html() %>%
                  html_nodes("#pgl_basic") %>%
                  html_table(), .id = 'playername') %>% 
  mutate(playername = players$player_name[as.numeric(playername)])
like image 150
Ronak Shah Avatar answered Oct 25 '25 16:10

Ronak Shah


There's a much cleaner way of approaching this problem with functional programming. First we set parameters in tibbles.

library(tidyverse)
library(glue)

kareem <- tibble(
  player_name = 'Kareem Abdul-Jabbar',
  player_id = 'abdulka01',
  initial = 'a',
  year = 1970:1989)

karl <- tibble(
  player_name = 'Karl Malone',
  player_id = 'malonka01',
  initial = 'm',
  year = 1986:2004)

Now to replicate your loop:

bind_rows(kareem, karl) %>%
  mutate(
    url = pmap_chr( # iterate over multiple variables and return a character vctr
      list(initial, player_id, year), # choose these variables
      function(initial, player_id, year) { # and apply this function
        
        base <- "https://www.basketball-reference.com/players"
        
        glue('{base}/{initial}/{player_id}/gamelog/{year}')  # returns the url
        
      }),
    webpage = map(url, read_html), # iterate over urls and apply read_html
    temp = map(
      webpage,  # iterate over webpage
      ~ html_nodes(.x, "#pgl_basic") %>% # and apply this function
        html_table())
    ) -> # assign to a new tibble
scrapped_data 


Note that the lambda notation ~ .x is just another way of specifying a function:

square <- function(v) {v^2}

map_dbl(1:4, ~ .x^2)
map_dbl(1:4, function(.x) .x^2)
map_dbl(1:4, square)
like image 38
Pedro Cavalcante Avatar answered Oct 25 '25 16:10

Pedro Cavalcante



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!