Getting Google Spreadsheet CSV into A Pandas Dataframe

Tags:

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...

900

asked Oct 26 '13 20:10

dartdog

4 Answers

Seems to work for me without the StringIO:

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

161

answered Oct 13 '22 06:10

Max Ghenis

You can use read_csv() on a StringIO object:

from io import BytesIO

import requests
import pandas as pd

r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
    
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4

answered Oct 13 '22 06:10

TomAugspurger

Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

answered Oct 13 '22 06:10

Ken Arnold

My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())

answered Oct 13 '22 05:10

Abhery Guha

Related questions
                            
                                How to use openCV's connected components with stats in python?
                            
                                how to get request object in django unit testing?
                            
                                Drop all data in a pandas dataframe
                            
                                catching SQLAlchemy exceptions
                            
                                How to launch python Idle from a virtual environment (virtualenv)
                            
                                How do I properly set the Datetimeindex for a Pandas datetime object in a dataframe?
                            
                                How do I check if keras is using gpu version of tensorflow?
                            
                                Getting the date of 7 days ago from current date in python [closed]
                            
                                AttributeError: Module Pip has no attribute 'main'
                            
                                How can I start ipython running a script?
                            
                                pandas read_csv index_col=None not working with delimiters at the end of each line
                            
                                How do I remove the background from this kind of image?
                            
                                Python: read all text file lines in loop
                            
                                no module named urllib.parse (How should I install it?)
                            
                                Testing if a pandas DataFrame exists
                            
                                Django - How to do tuple unpacking in a template 'for' loop
                            
                                How to find out if Python is compiled with UCS-2 or UCS-4?
                            
                                How can I install the Python library 'gevent' on Mac OS X Lion
                            
                                Python: How to use RegEx in an if statement?
                            
                                What do these operators mean (** , ^ , %, //)? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting Google Spreadsheet CSV into A Pandas Dataframe

Tags:

python

pandas

google-sheets

google-drive-api

google-apps

dartdog

People also ask

4 Answers

Max Ghenis

TomAugspurger

Ken Arnold

Abhery Guha

Recent Activity

Donate For Us