Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

source of historical stock data [closed]

I'm trying to make a stock market simulator (perhaps eventually growing into a predicting AI), but I'm having trouble finding data to use. I'm looking for a (hopefully free) source of historical stock market data.

Ideally, it would be a very fine-grained (second or minute interval) data set with price and volume of every symbol on NASDAQ and NYSE (and perhaps others if I get adventurous). Does anyone know of a source for such info?

I found this question which indicates Yahoo offers historical data in CSV format, but I've been unable to find out how to get it in a cursory examination of the site linked.

I also don't like the idea of downloading the data piecemeal in CSV files... I imagine Yahoo would get upset and shut me off after the first few thousand requests.

I also discovered another question that made me think I'd hit the jackpot, but unfortunately that OpenTick site seems to have closed its doors... too bad, since I think they were exactly what I wanted.

I'd also be able to use data that's just open/close price and volume of every symbol every day, but I'd prefer all the data if I can get it. Any other suggestions?

like image 757
rmeador Avatar asked Apr 16 '09 03:04

rmeador


People also ask

What is historical stock data?

Historical data provides up to 10 years of daily historical stock prices and volumes for each stock. Historical price trends can indicate the future direction of a stock.

How do I find past stock prices?

Begin by doing a search using the company ticker symbol. Then choose "Historical Prices" from the blue bar on the left. Choose start and end date. Choose whether you need daily, weekly, or monthly data.

Where can I find historical market cap data?

You may have to go to the respective Country's stock exchange websites to get the data. Go to the Stock Exchange Database, it will be there. macrotrends.com provides graphs of market capitalization values (historical data), but those have to be retrieved manually.


2 Answers

Let me add my 2¢, it's my job to get good and clean data for a hedge-fund, I've seen quite a lot of data feeds and historical data providers. This is mainly about US stock data.

To start with, if you have some money don't bother with downloading data from Yahoo, get the end of day data straight from CSI data, this is where Yahoo gets their EOD data as well AFAIK. They have an API where you can extract the data to whatever format you want. I think the yearly subscription for data is a few $100 bucks.

The main problem with downloading data from a free service is that you only get stocks that still exist, this is called Survivorship Bias and can give you wrong results if you look at many stocks, because you'll only include the ones that made it so far and not the ones that were de-listed.

For playing around with some intraday data I'd look into IQFeed, they provide several APIs to extract historical data, although they are mainly an outfit for real-time feeds. But here there are quite a few options, some brokers even provide historical data downloads via their APIs, so just pick your poison.

BUT usually all of this data is not very clean, once you really start back testing you'll see that certain stocks are missing or appear as two different symbols, or stock splits are not properly accounted for, etc. And then you realize that historical dividend data is need as well and so you start running in circles, patching data together from 100 different data sources and so on. So to start with a "discount" data feed will do, but as soon as you run more comprehensive backtests you might run into problems depending on what you do. If you just look at, let's say, the S&P 500 stocks this will not be so much a problem though and a "cheap" intraday feed will do.

What you will not find is free intraday data. I mean you might find some examples, I'm sure there's somewhere 5 years of MSFT tick data floating around but that will not get you very far.

Then, if you need the real stuff (level II order book, all ticks as they have happened at all exchanges) one "affordable", yet excellent option is Nanex. They'll actually ship you a drive with terabytes of data. If I remember right its about $3k-4K per year of data. But trust me, once you understand how hard it is to get good intraday data, you won't think this is very much money at all.

Not to discourage you but to get good data is hard, so hard in fact that many hedge-funds and banks spend hundreds of thousands of dollars a month to get data they can trust. Again, you can start somewhere and then go from there but it's good to see it a bit in context.


Edit: The answer above is from my own experience. This write-up from Caltech about available data feeds will give more insights, and especially recommends QuantQuote.

like image 103
lukebuehler Avatar answered Oct 10 '22 00:10

lukebuehler


THIS ANSWER IS NO LONGER ACCURATE AS THE YAHOO FEED HAS CEASED TO EXIST

Using Yahoo's CSV approach above you can also get historical data! You can reverse engineer the following example:

http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv

Essentially:

sn = TICKER a = fromMonth-1 b = fromDay (two digits) c = fromYear d = toMonth-1 e = toDay (two digits) f = toYear g = d for day, m for month, y for yearly 

The complete list of parameters:

a   Ask a2  Average Daily Volume a5  Ask Size b   Bid b2  Ask (Real-time) b3  Bid (Real-time) b4  Book Value b6  Bid Size c   Change & Percent Change c1  Change c3  Commission c6  Change (Real-time) c8  After Hours Change (Real-time) d   Dividend/Share d1  Last Trade Date d2  Trade Date e   Earnings/Share e1  Error Indication (returned for symbol changed / invalid) e7  EPS Estimate Current Year e8  EPS Estimate Next Year e9  EPS Estimate Next Quarter f6  Float Shares g   Day's Low h   Day's High j   52-week Low k   52-week High g1  Holdings Gain Percent g3  Annualized Gain g4  Holdings Gain g5  Holdings Gain Percent (Real-time) g6  Holdings Gain (Real-time) i   More Info i5  Order Book (Real-time) j1  Market Capitalization j3  Market Cap (Real-time) j4  EBITDA j5  Change From 52-week Low j6  Percent Change From 52-week Low k1  Last Trade (Real-time) With Time k2  Change Percent (Real-time) k3  Last Trade Size k4  Change From 52-week High k5  Percent Change From 52-week High l   Last Trade (With Time) l1  Last Trade (Price Only) l2  High Limit l3  Low Limit m   Day's Range m2  Day's Range (Real-time) m3  50-day Moving Average m4  200-day Moving Average m5  Change From 200-day Moving Average m6  Percent Change From 200-day Moving Average m7  Change From 50-day Moving Average m8  Percent Change From 50-day Moving Average n   Name n4  Notes o   Open p   Previous Close p1  Price Paid p2  Change in Percent p5  Price/Sales p6  Price/Book q   Ex-Dividend Date r   P/E Ratio r1  Dividend Pay Date r2  P/E Ratio (Real-time) r5  PEG Ratio r6  Price/EPS Estimate Current Year r7  Price/EPS Estimate Next Year s   Symbol s1  Shares Owned s7  Short Ratio t1  Last Trade Time t6  Trade Links t7  Ticker Trend t8  1 yr Target Price v   Volume v1  Holdings Value v7  Holdings Value (Real-time) w   52-week Range w1  Day's Value Change w4  Day's Value Change (Real-time) x   Stock Exchange y   Dividend Yield 
like image 31
Fredrik E Avatar answered Oct 10 '22 00:10

Fredrik E