Google's finance API is incomplete -- many of the figures on a page such as:
http://www.google.com/finance?fstype=ii&q=NYSE:GE
are not available via the API.
I need this data to rank companies on Canadian stock exchanges according to the formula of Greenblatt, available via google search for "greenblatt index scans".
My question: what is the most intelligent/clean/efficient way of accessing and processing the data on these webpages. Is the tedious approach really necessary in this case, and if so, what is the best way of going about it? I'm currently learning Python for projects related to this one.
The Google Finance API allows you to leverage historical securities data from Google Finance so that you can build customized analysis.
Data is provided by financial exchanges and other content providers and may be delayed as specified by financial exchanges or other data providers. Google does not verify any data and disclaims any obligation to do so.
The Google Finance Gadget API has been officially deprecated since October 2012, but as of April 2014, it's still active. It is completely dead as of March 2022. Note that if your application is for public consumption, using the Google Finance API is against Google's terms of service.
You could try asking Google to provide the missing APIs. Otherwise, you're stuck with screen scraping, which is never fun, prone to breaking without notice, and likely in violation of Google's terms of service.
But, if you still want to write a screen scraper, it's hard to beat a combination of mechanize and BeautifulSoup. BeautifulSoup is an HTML parser and mechanize is a Python-based web browser that will let you log in, store cookies, and generally navigate around like any other web browser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With