Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download prices with python

Tags:

python

I have tried this before. I'm completely at a loss for ideas.

On this page this dialog box to qet quotes. http://www.schwab.com/public/schwab/non_navigable/marketing/email/get_quote.html?

I used SPY, XLV, IBM, MSFT

The output is the above with a table.

If you have an account the quote are real time --- via cookie.

How do I get the table into python using 2.6. The data as list or dictionary

like image 651
Merlin Avatar asked Dec 16 '22 19:12

Merlin


2 Answers

Use something like Beautiful Soup to parse the HTML response from the web site and load it into a dictionary. use the symbol as the key and a tuple of whatever data you're interested in as the value. Iterate over all the symbols returned and add one entry per symbol.

You can see examples of how to do this in Toby Segaran's "Programming Collective Intelligence". The samples are all in Python.

like image 88
duffymo Avatar answered Jan 06 '23 22:01

duffymo


First problem: the data is actually in an iframe in a frame; you need to be looking at https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=APC (where you substitute the appropriate symbol on the end of the URL).

Second problem: extracting the data from the page. I personally like lxml and xpath, but there are many packages which will do the job. I would probably expect some code like

import urllib2
import lxml.html
import re
re_dollars = '\$?\s*(\d+\.\d{2})'

def urlExtractData(url, defs):
    """
    Get html from url, parse according to defs, return as dictionary

    defs is a list of tuples ("name", "xpath", "regex", fn )
      name becomes the key in the returned dictionary
      xpath is used to extract a string from the page
      regex further processes the string (skipped if None)
      fn casts the string to the desired type (skipped if None)
    """

    page = urllib2.urlopen(url) # can modify this to include your cookies
    tree = lxml.html.parse(page)

    res = {}
    for name,path,reg,fn in defs:
        txt = tree.xpath(path)[0]

        if reg != None:
            match = re.search(reg,txt)
            txt = match.group(1)

        if fn != None:
            txt = fn(txt)

        res[name] = txt

    return res

def getStockData(code):
    url = 'https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=' + code
    defs = [
        ("stock_name", '//span[@class="header1"]/text()', None, str),
        ("stock_symbol", '//span[@class="header2"]/text()', None, str),
        ("last_price", '//span[@class="neu"]/text()', re_dollars, float)
        # etc
    ]
    return urlExtractData(url, defs)

When called as

print repr(getStockData('MSFT'))

it returns

{'stock_name': 'Microsoft Corp', 'last_price': 25.690000000000001, 'stock_symbol': 'MSFT:NASDAQ'}

Third problem: the markup on this page is presentational, not structural - which says to me that code based on it will likely be fragile, ie any change to the structure of the page (or variation between pages) will require reworking your xpaths.

Hope that helps!

like image 26
Hugh Bothwell Avatar answered Jan 06 '23 22:01

Hugh Bothwell