I want to get the max historical price data with scrapy from yahoo finance.
Here is url of fb(facebook) max historical price data.
https://query1.finance.yahoo.com/v7/finance/download/FNMA?period1=221115600&period2=1508472000&interval=1d&events=history&crumb=1qRuQKELxmM
In order to write a stock price web crawler ,two problems i can't solve.
1.How to get the argument period1 ?
You can get it by hand in the web page,just to click max.
How to get the argument with python codes?
Different stock has the different period1 value.
2.How to create the argument crumb=1qRuQKELxmM automatically ,different stocks with different crumb value?
Here is my stock max historical data with scrapy framework.
import scrapy
class TestSpider(scrapy.Spider):
name = "quotes"
allowed_domains = ["finance.yahoo.com"]
def __init__(self, *args, **kw):
self.timeout = 10
def start_requests(self):
stockName = get-it and ommit the codes
for stock in stockName:
period1 = how to fill it
crumb = how to fill it
per_stock_max_data = "https://query1.finance.yahoo.com/v7/finance\
download/"+stock+"?period1="+period1+"&period2=1508472000&\
interval=1d&events=history&"+"crumb="crumb
yield scrapy.Request(per_stock_max_data,callback=self.parse)
def parse(self, response):
content = response.body
target = response.url
#do something
How to fill the blank above in my web scrawler framework?
As I understand you want to download all possible data for a specific ticker. So to do this you actually don't need to provide period1
parameter, if you provide 0 in the place of period1
then Yahoo API puts as default the oldest date.
To download quotes using the way you showed in the question we unfortunately have to deal with cookies. I will let myself provide solution without using Scrapy, only ticker itself is required:
def get_yahoo_ticker_data(ticker):
res = requests.get('https://finance.yahoo.com/quote/' + ticker + '/history')
yahoo_cookie = res.cookies['B']
yahoo_crumb = None
pattern = re.compile('.*"CrumbStore":\{"crumb":"(?P<crumb>[^"]+)"\}')
for line in res.text.splitlines():
m = pattern.match(line)
if m is not None:
yahoo_crumb = m.groupdict()['crumb']
cookie_tuple = yahoo_cookie, yahoo_crumb
current_date = int(time.time())
url_kwargs = {'symbol': ticker, 'timestamp_end': current_date,
'crumb': cookie_tuple[1]}
url_price = 'https://query1.finance.yahoo.com/v7/finance/download/' \
'{symbol}?period1=0&period2={timestamp_end}&interval=1d&events=history' \
'&crumb={crumb}'.format(**url_kwargs)
response = requests.get(url_price, cookies={'B': cookie_tuple[0]})
return pd.read_csv(StringIO(response.text), parse_dates=['Date'])
If you really need the oldest date then you can use the code above and extract the first date from the response.
get_yahoo_ticker_data(ticker='AAPL')
I do know that web scraping is not an efficient option but it's the only option we have because Yahoo already decommissioned all APIs. You might find some third party solution but all of them use scraping inside their source code and they add some additional boiler plate code that decreases overall performance.
after installing pandas datareader with:
pip install pandas-datareader
You can request the stock prices with this code:
import pandas_datareader as pdr
from datetime import datetime
appl = pdr.get_data_yahoo(symbols='AAPL', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(appl['Adj Close'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With