Yahoo finance updated their website. I had an lxml/etree script that used to extract the analyst recommendations. Now, however, the analyst recommendations are there, but only as a graphic. You can see an example on this page. The graph called Recommendation Trends on the right hand column shows the number of analyst reports showing strong buy, buy, hold, underperform, and sell.
My guess is that yahoo will make a few adjustments to the page over the coming little while, but it got me wondering whether such data was extractable in any reasonable way?
I used to get the source like this:
url = 'https://finance.yahoo.com/quote/'+code+'/analyst?p='+code
tree = etree.HTML(urllib.request.urlopen(url).read())
and then find the data in the html tree. But obviously that's impossible now.
The page is quite dynamic and involves a lot of javascript executed in a browser. To follow the @Padraic's advice about switching to selenium
, here is a complete sample working code that produces a month-to-trend dictionary at the end. The values of each bar are calculated as proportions of bar heights:
from pprint import pprint
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://finance.yahoo.com/quote/CSX/analysts?p=CSX")
# wait for the chart to be visible
wait = WebDriverWait(driver, 10)
trends = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section[data-reactid$=trends]")))
chart = trends.find_element_by_css_selector("svg.ratings-chart")
# get labels
month_names = [month.text for month in chart.find_elements_by_css_selector("g.x-axis g.tick")]
trend_names = [trend.text for trend in trends.find_elements_by_css_selector("table tr > td:nth-of-type(2)")]
# construct month-to-trend dictionary
data = {}
months = chart.find_elements_by_css_selector("g[transform]:not([class])")
for month_name, month_data in zip(month_names, months):
total = month_data.find_element_by_css_selector("text.total").text
data[month_name] = {'total': total}
bars = month_data.find_elements_by_css_selector("g.bar rect")
# let's calculate the values of bars as proportions of a bar height
heights = {trend_name: int(bar.get_attribute("height")) for trend_name, bar in zip(trend_names[::-1], bars)}
total_height = sum(heights.values())
for trend_name, bar in zip(trend_names, bars):
data[month_name][trend_name] = heights[trend_name] * 100 / total_height
driver.close()
pprint(data)
Prints:
{u'Aug': {u'Buy': 19,
u'Hold': 45,
u'Sell': 3,
u'Strong Buy': 22,
u'Underperform': 8,
'total': u'26'},
u'Jul': {u'Buy': 18,
u'Hold': 44,
u'Sell': 3,
u'Strong Buy': 25,
u'Underperform': 7,
'total': u'27'},
u'Jun': {u'Buy': 21,
u'Hold': 38,
u'Sell': 3,
u'Strong Buy': 28,
u'Underperform': 7,
'total': u'28'},
u'May': {u'Buy': 21,
u'Hold': 38,
u'Sell': 3,
u'Strong Buy': 28,
u'Underperform': 7,
'total': u'28'}}
The total
values are labels that you see on top of each bar.
Hope this would at least be a good start for you. Let me know if you want me to elaborate on any part of the code or require any additional information.
As comments say they have moved to ReactJS, so lxml
is no longer to the point because there's no data in the HTML page. Now you need to look around and find the endpoint where they are pulling the data from. In case of Recommendation Trends it's there.
#!/usr/bin/env python3
import json
from pprint import pprint
from urllib.request import urlopen
from urllib.parse import urlencode
def parse():
host = 'https://query2.finance.yahoo.com'
path = '/v10/finance/quoteSummary/CSX'
params = {
'formatted' : 'true',
'lang' : 'en-US',
'region' : 'US',
'modules' : 'recommendationTrend'
}
response = urlopen('{}{}?{}'.format(host, path, urlencode(params)))
data = json.loads(response.read().decode())
pprint(data)
if __name__ == '__main__':
parse()
The output looks like this.
{
'quoteSummary': {
'error': None,
'result': [{
'recommendationTrend': {
'maxAge': 86400,
'trend': [{
'buy': 0,
'hold': 0,
'period': '0w',
'sell': 0,
'strongBuy': 0,
'strongSell': 0
},
{
'buy': 0,
'hold': 0,
'period': '-1w',
'sell': 0,
'strongBuy': 0,
'strongSell': 0
},
{
'buy': 5,
'hold': 12,
'period': '0m',
'sell': 2,
'strongBuy': 6,
'strongSell': 1
},
{
'buy': 5,
'hold': 12,
'period': '-1m',
'sell': 2,
'strongBuy': 7,
'strongSell': 1
},
{
'buy': 6,
'hold': 11,
'period': '-2m',
'sell': 2,
'strongBuy': 8,
'strongSell': 1
},
{
'buy': 6,
'hold': 11,
'period': '-3m',
'sell': 2,
'strongBuy': 8,
'strongSell': 1
}]
}
}]
}
}
What I did was roughly:
/* -- Data -- */
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With