I'm trying to scrape some data from a web page and put it into a pandas dataframe. I tried and read many things but I just cannot get what I want. And I want a dataframe with all the data in separate columns and rows. Below is my code.
import requests
import json
import pandas as pd
from pandas.io.json import json_normalize
r = requests.get('http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php')
a = json.loads(r.text)
res = json_normalize(a)
##print(res)
df = pd.DataFrame(res)
print(df)
##df = pd.read_json(a)
##print(df)
pd.read_json(a)
doesn't seem to work in any way. Could someone give it a try?
Thanks for all the help in advance.
Best regards, David
Or, more simply:
import requests
import pandas as pd
r = requests.get('http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php')
j = r.json()
df = pd.DataFrame.from_dict(j)
you can do it this way:
import requests
import pandas as pd
r = requests.get('http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php')
j = r.json()
df = pd.DataFrame([[d['v'] for d in x['c']] for x in j['rows']],
columns=[d['label'] for d in j['cols']])
Result:
In [217]: df
Out[217]:
Country Weight CAPE PE PC PB PS DY RS 26W RS 52W Score
0 Russia 1.1 5.9 9.1 5.1 1.0 0.9 3.7 1.22 1.35 1.0
1 China 1.1 12.8 7.2 4.5 0.9 0.6 4.2 1.05 1.13 2.0
2 Italy 1.0 12.7 31.5 5.7 1.2 0.6 3.3 1.13 1.11 3.0
3 Austria 0.2 14.3 21.7 7.3 1.1 0.7 2.5 1.10 1.15 4.0
4 Norway 0.4 12.8 32.4 7.4 1.6 1.2 4.0 1.10 1.17 5.0
5 Hungary 0.0 12.5 49.8 7.5 1.4 0.7 2.3 1.12 1.19 6.0
6 Spain 1.2 11.7 24.7 7.0 1.4 1.2 3.7 1.08 1.11 7.0
7 Czech 0.0 8.9 13.6 6.1 1.3 1.0 6.7 1.03 1.05 8.0
8 Brazil 1.3 9.8 42.1 7.4 1.6 1.2 3.0 1.06 1.24 9.0
9 Portugal 0.1 11.3 29.0 4.8 1.5 0.7 3.9 1.05 1.06 10.0
.. ... ... ... ... ... ... ... ... ... ... ...
42 EMERGING MARKETS 13.5 14.0 16.0 8.8 1.6 1.3 2.9 1.04 1.11 NaN
43 DEVELOPED EUROPE 22.4 16.6 26.5 9.9 1.8 1.1 3.2 1.06 1.08 NaN
44 EMERGING EUROPE 1.7 8.6 10.9 5.8 1.1 0.8 3.4 1.13 1.20 NaN
45 EMERGING AMERICA 3.0 15.2 30.1 9.4 1.9 1.2 2.4 1.03 1.11 NaN
46 DEVELOPED ASIA-PACIFIC 17.7 NaN 17.7 8.8 1.3 0.9 2.5 1.03 1.09 NaN
47 EMERGING ASIA-PACIFIC 6.9 14.9 15.1 9.1 1.8 1.4 2.7 1.01 1.08 NaN
48 EMERGING AFRICA 0.8 NaN 16.5 10.6 2.0 1.4 3.8 1.06 1.12 NaN
49 MIDDLE EAST 1.3 NaN 13.7 11.8 1.5 1.8 3.9 1.06 1.10 NaN
50 BRIC 5.9 11.8 14.6 7.4 1.4 1.2 2.7 1.06 1.16 NaN
51 OTHER EMERGING MKT. 2.5 NaN 17.7 12.9 1.8 1.5 3.1 1.16 1.20 NaN
[52 rows x 11 columns]
And one step simpler than Justin's (already helpful) response...by putting .json() at the end of the r = requests.get
line
import requests
import pandas as pd
r = requests.get('http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php').json()
df = pd.DataFrame.from_dict(r)
You may also want pd.json_normalize
for when your data isn't exactly the way from_dict() expects.
For example:
data = [
{
"id": 1,
"name": "Cole Volk",
"fitness": {"height": 130, "weight": 60},
},
{"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
{
"id": 2,
"name": "Faye Raker",
"fitness": {"height": 130, "weight": 60},
},
]
pd.json_normalize(data, max_level=1)
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With