Looking to sharpen my data science skills. I am practicing url data pulls from a sports site and the json file has multiple nested dictionaries. I would like to be able to pull this data to map my own custom form of the leaderboard in matplotlib, etc., but am having a hard time getting the json to a workable df.
The main website is: https://www.usopen.com/scoring.html
Looking at the background I believe the live info is being pulled from the link listed in the short code below. I'm working in Jupyter notebooks. I can get the data successfully pulled.
But as you can see, it is pulling multiple nested dictionaries which is making it very difficult in getting a simple dataframe pulled.
Was just looking to get player, score to par, total, and round pulled. Any help would be greatly appreciated, thank you!
import pandas as pd
import urllib as ul
import json
url = "https://gripapi-static-pd.usopen.com/gripapi/leaderboard.json"
response = ul.request.urlopen(url)
data = json.loads(response.read())
print(data)
Read JSON File into DataFrame You can convert JSON to Pandas DataFrame by simply using read_json() . Just pass JSON string to the function. It takes multiple parameters, for our case I am using orient that specifies the format of JSON string. This function is also used to read JSON files into pandas DataFrame.
Python has built in functions that easily imports JSON files as a Python dictionary or a Pandas dataframe. Use pd. read_json() to load simple JSONs and pd. json_normalize() to load nested JSONs.
requests.get(url).json()
to get the datapandas.json_normalize
to unpack the standings
key into a dataframeroundScores
is a list of dicts
.explode
df
import requests
import pandas as pd
# load the data
df = pd.json_normalize(requests.get(url).json(), 'standings')
# explode the roundScores column
df = df.explode('roundScores').reset_index(drop=True)
# normalize the dicts in roundScores and join back to df
df = df.join(pd.json_normalize(df.roundScores), rsuffix='_rs').drop(columns=['roundScores']).reset_index(drop=True)
# display(df.head())
isRecapAvailable player.identifier player.firstName player.lastName player.image.gravity player.image.type player.image.identifier player.image.cropMode player.country.name player.country.code player.country.flag.type player.country.flag.identifier player.isAmateur toPar.value toPar.format toPar.displayValue toParToday.value toParToday.format toParToday.displayValue totalScore.value totalScore.format totalScore.displayValue position.value position.format position.displayValue holesThrough.value holesThrough.format holesThrough.displayValue liveVideo.identifier liveVideo.isLive score.value score.format score.displayValue toPar.value_rs toPar.format_rs toPar.displayValue_rs
0 True 56278 Matthew Wolff center imageCloudinary us-open/players/2020-players/Matthew_Wolff fill United States usa imageCloudinary us-open/flags/usa False -5 absolute -5 -5 absolute -5 140.0 absolute 140 1 absolute 1 10 absolute 10 NaN NaN 66 absolute 66 -4 absolute -4
1 True 56278 Matthew Wolff center imageCloudinary us-open/players/2020-players/Matthew_Wolff fill United States usa imageCloudinary us-open/flags/usa False -5 absolute -5 -5 absolute -5 140.0 absolute 140 1 absolute 1 10 absolute 10 NaN NaN 74 absolute 74 4 absolute +4
2 True 56278 Matthew Wolff center imageCloudinary us-open/players/2020-players/Matthew_Wolff fill United States usa imageCloudinary us-open/flags/usa False -5 absolute -5 -5 absolute -5 140.0 absolute 140 1 absolute 1 10 absolute 10 NaN NaN 0 absolute -5 absolute -5
3 True 34360 Patrick Reed center imageCloudinary us-open/players/2019-players/Patrick-Reed fill United States usa imageCloudinary us-open/flags/usa False -4 absolute -4 0 absolute E 136.0 absolute 136 2 absolute 2 7 absolute 7 NaN NaN 66 absolute 66 -4 absolute -4
4 True 34360 Patrick Reed center imageCloudinary us-open/players/2019-players/Patrick-Reed fill United States usa imageCloudinary us-open/flags/usa False -4 absolute -4 0 absolute E 136.0 absolute 136 2 absolute 2 7 absolute 7 NaN NaN 70 absolute 70 0 absolute E
standings
is just one of the keys from the downloaded JSONr = requests.get(url).json()
print(r)
[out]:
dict_keys(['currentRound', 'standings', 'fullLegend', 'shortLegend', 'inlineLegend', 'cutLine', 'meta'])
Simple and Quick Solution. A better solution might exist with JSON normalize from pandas but this is fairly good for your use case.
def func(x):
if not any(x.isnull()):
return (x['round'], x['player']['firstName'], x['player']['identifier'], x['toParToday']['value'], x['totalScore']['value'])
df = pd.DataFrame(data['standings'])
df['round'] = data['currentRound']['name']
df = df[['player', 'toPar', 'toParToday', 'totalScore', 'round']]
info = df.apply(func, axis=1)
info_df = pd.DataFrame(list(info.values), columns=['Round', 'player_name', 'pid', 'to_par_today', 'totalScore'])
info_df.head()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With