Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading and accessing data from github python

Tags:

git

python

pandas

Hi I'm going through Python for Data analysis and I'd like to analyze the data he goes through in the book. In chapter 9, he uses the data below. However, I'm having a difficult time understanding how to utilize the data in my ipython notebook once I download it to my github application on mac.

The stock data is here: https://github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv

I clicked "open" which downloaded a large file on my github application. It looks like the below. How do I get this data to open in my ipython notebook?

**Looking at other stackoverflow questions, I know I can just download the zip file, which I am doing as well. It would be cool to know how to use the github application efficiently.

Right clicking and saving the csv file seems to save the json/html file

enter image description here

like image 831
user3314418 Avatar asked May 05 '14 03:05

user3314418


People also ask

How do I download from GitHub to Python?

Download a Github RepositoryOn GitHub, navigate to the main page of the repository. Click the Clone or download button located under the repository name. A dropdown is displayed. Click on Download ZIP and save the repository as a zip file to your system.


1 Answers

You should be able to just use the url of the raw version (a link to the raw version is a button on the link you provided) and then read it into a dataframe directly using read_csv:

import pandas as pd
url = 'https://raw.githubusercontent.com/pydata/pydata-book/master/ch09/stock_px.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])

print df.head(5)

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93

Edit: a brief explanation about the options I used to read in the file:

df = pd.read_csv(url,index_col=0,parse_dates=[0])

The first column (column = 0) is a column of dates in the file and because it had no column name it looked like it was meant to be the index; index_col=0 makes it the index and parse_dates[0] tells read_csv to parse column=0 (the first column) as dates.

like image 58
Karl D. Avatar answered Oct 15 '22 20:10

Karl D.