Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python and urllib to get data from Yahoo FInance

I was using urllib in python to get stock prices from yahoo finance. Here is my code so far:

import urllib
import re

name = raw_input(">")

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)

htmltext = htmlfile.read()

# The problemed area 
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print price

So I enter a value, and the stock price comes out. But so far I can get it to display a price, just a blank [ ]. I hace commented over where I believe the problem is. Any suggestions? Thanks.

like image 725
ng150716 Avatar asked Apr 16 '14 04:04

ng150716


3 Answers

You have not escaped the forward slash in your regex. Change your regex from:

<span id="yfs_l84_%s">(.+?)</span>

to

<span id="yfs_l84_goog">(.+?)<\/span>

This will fix your problem assuming you enter the company's listing code as the input to your code. Ex; goog for google.

That said, regex is a bad choice for what you are trying to do. As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. With BeautifulSoup your code can be as simple as:

from bs4 import BeautifulSoup
import requests

name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text
like image 155
shaktimaan Avatar answered Sep 30 '22 14:09

shaktimaan


Any reason you can't use pandas? It has good support for financial data scraping and time series analysis.

http://pandas.pydata.org/pandas-docs/stable/remote_data.html

Here's the yahoo example straight from the documentation :

In [1]: import pandas.io.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 01, 27)
In [5]: f=web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]: 
OnOpen               10.17
High               10.28
Low                10.05
Close              10.28
Volume       60855800.00
Adj Close           9.75
Name: 2010-01-04 00:00:00, dtype: float64
like image 35
dranxo Avatar answered Sep 30 '22 15:09

dranxo


The best way to get data from Yahoo Finance using python2 or python3 is by using a POST method.
You can easily test this out using a Rest service like Postman

Open up postman and use Method POST and use this Then you will see the data. Simply re-create this in python

import requests
url="https://query1.finance.yahoo.com/v7/finance/download/GOOG? period1=1519938930&period2=1522354530&interval=1d&events=history&crumb=.tLvYBkGDu3"

response = requests.post(url)
print response.text

I used to get the data using urllib2 but it gives an authorization error now They are probably filtering everything through Rest methods like GET and POST

like image 24
Spencer Davis Avatar answered Sep 30 '22 14:09

Spencer Davis