I was using urllib in python to get stock prices from yahoo finance. Here is my code so far:
import urllib
import re
name = raw_input(">")
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)
htmltext = htmlfile.read()
# The problemed area
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print price
So I enter a value, and the stock price comes out. But so far I can get it to display a price, just a blank [ ]. I hace commented over where I believe the problem is. Any suggestions? Thanks.
You have not escaped the forward slash in your regex. Change your regex from:
<span id="yfs_l84_%s">(.+?)</span>
to
<span id="yfs_l84_goog">(.+?)<\/span>
This will fix your problem assuming you enter the company's listing code as the input to your code. Ex; goog for google.
That said, regex is a bad choice for what you are trying to do. As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. With BeautifulSoup your code can be as simple as:
from bs4 import BeautifulSoup
import requests
name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text
Any reason you can't use pandas? It has good support for financial data scraping and time series analysis.
http://pandas.pydata.org/pandas-docs/stable/remote_data.html
Here's the yahoo example straight from the documentation :
In [1]: import pandas.io.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 01, 27)
In [5]: f=web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]:
OnOpen 10.17
High 10.28
Low 10.05
Close 10.28
Volume 60855800.00
Adj Close 9.75
Name: 2010-01-04 00:00:00, dtype: float64
The best way to get data from Yahoo Finance using python2 or python3 is by using a POST method.
You can easily test this out using a Rest service like Postman
Open up postman and use Method POST and use this Then you will see the data. Simply re-create this in python
import requests
url="https://query1.finance.yahoo.com/v7/finance/download/GOOG? period1=1519938930&period2=1522354530&interval=1d&events=history&crumb=.tLvYBkGDu3"
response = requests.post(url)
print response.text
I used to get the data using urllib2 but it gives an authorization error now They are probably filtering everything through Rest methods like GET and POST
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With