I have tried this before. I'm completely at a loss for ideas.
On this page this dialog box to qet quotes. http://www.schwab.com/public/schwab/non_navigable/marketing/email/get_quote.html?
I used SPY, XLV, IBM, MSFT
The output is the above with a table.
If you have an account the quote are real time --- via cookie.
How do I get the table into python using 2.6. The data as list or dictionary
Use something like Beautiful Soup to parse the HTML response from the web site and load it into a dictionary. use the symbol as the key and a tuple of whatever data you're interested in as the value. Iterate over all the symbols returned and add one entry per symbol.
You can see examples of how to do this in Toby Segaran's "Programming Collective Intelligence". The samples are all in Python.
First problem: the data is actually in an iframe in a frame; you need to be looking at https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=APC (where you substitute the appropriate symbol on the end of the URL).
Second problem: extracting the data from the page. I personally like lxml and xpath, but there are many packages which will do the job. I would probably expect some code like
import urllib2
import lxml.html
import re
re_dollars = '\$?\s*(\d+\.\d{2})'
def urlExtractData(url, defs):
"""
Get html from url, parse according to defs, return as dictionary
defs is a list of tuples ("name", "xpath", "regex", fn )
name becomes the key in the returned dictionary
xpath is used to extract a string from the page
regex further processes the string (skipped if None)
fn casts the string to the desired type (skipped if None)
"""
page = urllib2.urlopen(url) # can modify this to include your cookies
tree = lxml.html.parse(page)
res = {}
for name,path,reg,fn in defs:
txt = tree.xpath(path)[0]
if reg != None:
match = re.search(reg,txt)
txt = match.group(1)
if fn != None:
txt = fn(txt)
res[name] = txt
return res
def getStockData(code):
url = 'https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=' + code
defs = [
("stock_name", '//span[@class="header1"]/text()', None, str),
("stock_symbol", '//span[@class="header2"]/text()', None, str),
("last_price", '//span[@class="neu"]/text()', re_dollars, float)
# etc
]
return urlExtractData(url, defs)
When called as
print repr(getStockData('MSFT'))
it returns
{'stock_name': 'Microsoft Corp', 'last_price': 25.690000000000001, 'stock_symbol': 'MSFT:NASDAQ'}
Third problem: the markup on this page is presentational, not structural - which says to me that code based on it will likely be fragile, ie any change to the structure of the page (or variation between pages) will require reworking your xpaths.
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With