I'm trying to scrape the "Major Stock Indexes table" from https://markets.wsj.com/ and would like to save it to a folder on my desktop. This is what I have so far:
import urllib.request
import json
import re
html = urllib.request.urlopen("https://markets.wsj.com/").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])
filename = "C:\Users\me\folder\sample.csv"
f = open(filename, "w")
for numbers in data['chart']:
for obs in numbers['Major Stock Indexes']:
f.write(str(obs['firstCol']) + "," + str(obs['dataCol']) + "," + str(obs['dataCol priceUp']) + str(obs['dataCol lastb priceUp']) + "\n")
print(obs.keys())
I'm getting the error: IndexError: list index out of range
Any ideas what might fix my issue?
your json_data an empty list [], you should use the scraping tool like bs4 as below:
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("https://markets.wsj.com/").read().decode('utf8')
soup = BeautifulSoup(html, 'html.parser') # parse your html
t = soup.find('table', {'summary': 'Major Stock Indexes'}) # finds tag table with attribute summary equals to 'Major Stock Indexes'
tr = t.find_all('tr') # get all table rows from selected table
row_lis = [i.find_all('td') if i.find_all('td') else i.find_all('th') for i in tr if i.text.strip()] # construct list of data
print([','.join(x.text.strip() for x in i) for i in row_lis])
Output:
[',Last,Change,% CHG,',
'DJIA,26049.64,259.29,1.01%',
'Nasdaq,8017.90,71.92,0.91%',
'S&P 500,2896.74,22.05,0.77%',
'Russell 2000,1728.41,2.73,0.16%',
'Global Dow,3105.09,3.73,0.12%',
'Japan: Nikkei 225,22930.58,130.94,0.57%',
'Stoxx Europe 600,385.57,2.01,0.52%',
'UK: FTSE 100,7577.49,14.27,0.19%']
Now you can just iterate over this list and store it in csv instead of printing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With