Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping NBA data in R with rjson

I have been spending a long time using R to try to scrape NBA data, so far I was doing it a little by trial and error, but finally I found this documentation. Some time ago I had some problems scraping the shotchartdetail, and I figured out the problem when I found this

This works

For that this is what I did:

shotURLtotal <- paste0("http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS=2016-17&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerID=0&PlusMinus=N&Position=&Rank=N&RookieYear=&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&showZones=0&PlayerPosition=")

Season <- rjson::fromJSON(file = shotURLtotal, method="C")
Names <- Season$resultSets[[1]][[2]]

Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))

colnames(Season) <- Names

But this does not

but when I try to do the same with the shotchartlineupdetail, and it does not work, I suspect it has to do with the CFID, which I don't know what it means, this is what I tried.

shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"


Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]

Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))

colnames(Season) <- Names

Expected Results

The expected result should be a dataframe with the following columns:

c("GRID_TYPE", "GAME_ID", "GAME_EVENT_ID", "GROUP_ID", "GROUP_NAME", "PLAYER_ID", "PLAYER_NAME", "TEAM_ID", "TEAM_NAME", "PERIOD", "MINUTES_REMAINING", "SECONDS_REMAINING", "EVENT_TYPE", "ACTION_TYPE", "SHOT_TYPE", "SHOT_ZONE_BASIC", "SHOT_ZONE_AREA", "SHOT_ZONE_RANGE", "SHOT_DISTANCE", "LOC_X", "LOC_Y", "SHOT_ATTEMPTED_FLAG", "SHOT_MADE_FLAG", "GAME_DATE", "HTM", "VTM")

which you can get by doing:

shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"


Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]

So Names would be the columns of the dataframe, the problem is that by not using the CFID you get that the list where the data for those columns should be are empty, the answer that @be_green gives are the league average, and I need the team specific data

like image 493
Derek Corcoran Avatar asked Dec 11 '17 01:12

Derek Corcoran


1 Answers

So I believe the issue here is that you need to pass a PlayerID and TeamID to the API. Using PlayerID = 2544 and TeamID = 1610612739 below as an example seems to work:

library(tidyverse)
res <- jsonlite::read_json("https://stats.nba.com/stats/shotchartdetail?AheadBehind=&ClutchTime=&ContextFilter=&ContextMeasure=PTS&DateFrom=&DateTo=&EndPeriod=&EndRange=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&Period=0&PlayerID=2544&PlayerPosition=&PointDiff=&Position=&RangeType=&RookieYear=&Season=&SeasonSegment=&SeasonType=Regular+Season&StartPeriod=&StartRange=&TeamID=1610612739&VsConference=&VsDivision=")
# res %>% str(max.level = 3)

header_names <- flatten_chr(res$resultSets[[1]]$headers)
header_names
#>  [1] "GRID_TYPE"           "GAME_ID"             "GAME_EVENT_ID"      
#>  [4] "PLAYER_ID"           "PLAYER_NAME"         "TEAM_ID"            
#>  [7] "TEAM_NAME"           "PERIOD"              "MINUTES_REMAINING"  
#> [10] "SECONDS_REMAINING"   "EVENT_TYPE"          "ACTION_TYPE"        
#> [13] "SHOT_TYPE"           "SHOT_ZONE_BASIC"     "SHOT_ZONE_AREA"     
#> [16] "SHOT_ZONE_RANGE"     "SHOT_DISTANCE"       "LOC_X"              
#> [19] "LOC_Y"               "SHOT_ATTEMPTED_FLAG" "SHOT_MADE_FLAG"     
#> [22] "GAME_DATE"           "HTM"                 "VTM"

res$resultSets[[1]]$rowSet %>%
  map(`[`, 1:24) %>%
  map(~ set_names(., header_names)) %>%
  bind_rows()
#> # A tibble: 8,369 x 24
#>    GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME
#>    <chr>     <chr>           <int>     <int> <chr>         <int> <chr>    
#>  1 Shot Cha~ 002030~            20      2544 LeBron Jam~  1.61e9 Clevelan~
#>  2 Shot Cha~ 002030~            28      2544 LeBron Jam~  1.61e9 Clevelan~
#>  3 Shot Cha~ 002030~            35      2544 LeBron Jam~  1.61e9 Clevelan~
#>  4 Shot Cha~ 002030~            54      2544 LeBron Jam~  1.61e9 Clevelan~
#>  5 Shot Cha~ 002030~            67      2544 LeBron Jam~  1.61e9 Clevelan~
#>  6 Shot Cha~ 002030~            76      2544 LeBron Jam~  1.61e9 Clevelan~
#>  7 Shot Cha~ 002030~           224      2544 LeBron Jam~  1.61e9 Clevelan~
#>  8 Shot Cha~ 002030~           233      2544 LeBron Jam~  1.61e9 Clevelan~
#>  9 Shot Cha~ 002030~           235      2544 LeBron Jam~  1.61e9 Clevelan~
#> 10 Shot Cha~ 002030~           322      2544 LeBron Jam~  1.61e9 Clevelan~
#> # ... with 8,359 more rows, and 17 more variables: PERIOD <int>,
#> #   MINUTES_REMAINING <int>, SECONDS_REMAINING <int>, EVENT_TYPE <chr>,
#> #   ACTION_TYPE <chr>, SHOT_TYPE <chr>, SHOT_ZONE_BASIC <chr>,
#> #   SHOT_ZONE_AREA <chr>, SHOT_ZONE_RANGE <chr>, SHOT_DISTANCE <int>,
#> #   LOC_X <int>, LOC_Y <int>, SHOT_ATTEMPTED_FLAG <int>,
#> #   SHOT_MADE_FLAG <int>, GAME_DATE <chr>, HTM <chr>, VTM <chr>

Created on 2019-03-26 by the reprex package (v0.2.1)

like image 111
JasonAizkalns Avatar answered Oct 23 '22 01:10

JasonAizkalns