I have been spending a long time using R to try to scrape NBA data, so far I was doing it a little by trial and error, but finally I found this documentation. Some time ago I had some problems scraping the shotchartdetail, and I figured out the problem when I found this
For that this is what I did:
shotURLtotal <- paste0("http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS=2016-17&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerID=0&PlusMinus=N&Position=&Rank=N&RookieYear=&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&showZones=0&PlayerPosition=")
Season <- rjson::fromJSON(file = shotURLtotal, method="C")
Names <- Season$resultSets[[1]][[2]]
Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))
colnames(Season) <- Names
but when I try to do the same with the shotchartlineupdetail, and it does not work, I suspect it has to do with the CFID, which I don't know what it means, this is what I tried.
shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"
Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]
Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))
colnames(Season) <- Names
The expected result should be a dataframe with the following columns:
c("GRID_TYPE", "GAME_ID", "GAME_EVENT_ID", "GROUP_ID", "GROUP_NAME", "PLAYER_ID", "PLAYER_NAME", "TEAM_ID", "TEAM_NAME", "PERIOD", "MINUTES_REMAINING", "SECONDS_REMAINING", "EVENT_TYPE", "ACTION_TYPE", "SHOT_TYPE", "SHOT_ZONE_BASIC", "SHOT_ZONE_AREA", "SHOT_ZONE_RANGE", "SHOT_DISTANCE", "LOC_X", "LOC_Y", "SHOT_ATTEMPTED_FLAG", "SHOT_MADE_FLAG", "GAME_DATE", "HTM", "VTM")
which you can get by doing:
shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"
Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]
So Names would be the columns of the dataframe, the problem is that by not using the CFID you get that the list where the data for those columns should be are empty, the answer that @be_green gives are the league average, and I need the team specific data
So I believe the issue here is that you need to pass a PlayerID
and TeamID
to the API. Using PlayerID = 2544
and TeamID = 1610612739
below as an example seems to work:
library(tidyverse)
res <- jsonlite::read_json("https://stats.nba.com/stats/shotchartdetail?AheadBehind=&ClutchTime=&ContextFilter=&ContextMeasure=PTS&DateFrom=&DateTo=&EndPeriod=&EndRange=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&Period=0&PlayerID=2544&PlayerPosition=&PointDiff=&Position=&RangeType=&RookieYear=&Season=&SeasonSegment=&SeasonType=Regular+Season&StartPeriod=&StartRange=&TeamID=1610612739&VsConference=&VsDivision=")
# res %>% str(max.level = 3)
header_names <- flatten_chr(res$resultSets[[1]]$headers)
header_names
#> [1] "GRID_TYPE" "GAME_ID" "GAME_EVENT_ID"
#> [4] "PLAYER_ID" "PLAYER_NAME" "TEAM_ID"
#> [7] "TEAM_NAME" "PERIOD" "MINUTES_REMAINING"
#> [10] "SECONDS_REMAINING" "EVENT_TYPE" "ACTION_TYPE"
#> [13] "SHOT_TYPE" "SHOT_ZONE_BASIC" "SHOT_ZONE_AREA"
#> [16] "SHOT_ZONE_RANGE" "SHOT_DISTANCE" "LOC_X"
#> [19] "LOC_Y" "SHOT_ATTEMPTED_FLAG" "SHOT_MADE_FLAG"
#> [22] "GAME_DATE" "HTM" "VTM"
res$resultSets[[1]]$rowSet %>%
map(`[`, 1:24) %>%
map(~ set_names(., header_names)) %>%
bind_rows()
#> # A tibble: 8,369 x 24
#> GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME
#> <chr> <chr> <int> <int> <chr> <int> <chr>
#> 1 Shot Cha~ 002030~ 20 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 2 Shot Cha~ 002030~ 28 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 3 Shot Cha~ 002030~ 35 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 4 Shot Cha~ 002030~ 54 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 5 Shot Cha~ 002030~ 67 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 6 Shot Cha~ 002030~ 76 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 7 Shot Cha~ 002030~ 224 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 8 Shot Cha~ 002030~ 233 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 9 Shot Cha~ 002030~ 235 2544 LeBron Jam~ 1.61e9 Clevelan~
#> 10 Shot Cha~ 002030~ 322 2544 LeBron Jam~ 1.61e9 Clevelan~
#> # ... with 8,359 more rows, and 17 more variables: PERIOD <int>,
#> # MINUTES_REMAINING <int>, SECONDS_REMAINING <int>, EVENT_TYPE <chr>,
#> # ACTION_TYPE <chr>, SHOT_TYPE <chr>, SHOT_ZONE_BASIC <chr>,
#> # SHOT_ZONE_AREA <chr>, SHOT_ZONE_RANGE <chr>, SHOT_DISTANCE <int>,
#> # LOC_X <int>, LOC_Y <int>, SHOT_ATTEMPTED_FLAG <int>,
#> # SHOT_MADE_FLAG <int>, GAME_DATE <chr>, HTM <chr>, VTM <chr>
Created on 2019-03-26 by the reprex package (v0.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With