Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sheets of Excel Workbook from a URL into a `pandas.DataFrame`

After looking at different ways to read an url link, pointing to a .xls file, I decided to go with using xlrd.

I am having a difficult time converting a 'xlrd.book.Book' type to a 'pandas.DataFrame'

I have the following:

import pandas
import xlrd 
import urllib2

link ='http://www.econ.yale.edu/~shiller/data/chapt26.xls'
socket = urllib2.urlopen(link)

#this line gets me the excel workbook 
xlfile = xlrd.open_workbook(file_contents = socket.read())

#storing the sheets
sheets = xlfile.sheets()

I want to tak the last sheet of sheets and import as a pandas.DataFrame, any ideas as to how I can accomplish this? I've tried, pandas.ExcelFile.parse() but it wants a path to an excel file. I can of certainly save the file to memory and then parse (using tempfile or something), but I'm trying to follow pythonic guidelines and use functionality likely already written into pandas.

Any guidance is greatly appreciated as always.

like image 898
benjaminmgross Avatar asked Mar 23 '13 15:03

benjaminmgross


People also ask

Can Excel sheet be loaded with pandas library?

Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL. Supports an option to read a single sheet or a list of sheets. Any valid string path is acceptable.


2 Answers

You can pass your socket to ExcelFile:

>>> import pandas as pd
>>> import urllib2
>>> link = 'http://www.econ.yale.edu/~shiller/data/chapt26.xls'
>>> socket = urllib2.urlopen(link)
>>> xd = pd.ExcelFile(socket)
NOTE *** Ignoring non-worksheet data named u'PDVPlot' (type 0x02 = Chart)
NOTE *** Ignoring non-worksheet data named u'ConsumptionPlot' (type 0x02 = Chart)
>>> xd.sheet_names
[u'Data', u'Consumption', u'Calculations']
>>> df = xd.parse(xd.sheet_names[-1], header=None)
>>> df
                                   0   1   2   3         4
0        Average Real Interest Rate: NaN NaN NaN  1.028826
1    Geometric Average Stock Return: NaN NaN NaN  0.065533
2              exp(geo. Avg. return) NaN NaN NaN  0.067728
3  Geometric Average Dividend Growth NaN NaN NaN  0.012025
like image 154
DSM Avatar answered Sep 22 '22 19:09

DSM


You can pass a URL to pandas.read_excel():

import pandas as pd

link ='http://www.econ.yale.edu/~shiller/data/chapt26.xls'
data = pd.read_excel(link,'sheetname')
like image 32
aghazaly Avatar answered Sep 21 '22 19:09

aghazaly