Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Google Spreadsheet CSV into A Pandas Dataframe

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...

like image 900
dartdog Avatar asked Oct 26 '13 20:10

dartdog


People also ask

How do I import data from Google Sheets to Jupyter?

Connecting Google Sheets to Jupyter Notebook 2. Create a new worksheet in your Google Sheet. Then click the share button on the top right and input the email from the Service Account in your google API dashboard. This to make sure that the Worksheet is connected to our Jupyter Notebook.

How to read Google Sheets data into a pandas Dataframe?

I want to quickly share two ways of reading Google Sheets data into a pandas DataFrame through the .read_csv () method without having to safe the sheet locally first or having to use the GoogleAPI. All you need is a Google Sheets file with one or more sheets and of course some data.

How do I read data from a sheet to a Dataframe?

The sheet name is the name you gave your sheet. After you’ve specified the id and name you can simply use the appropriately formatted url string to read the data into a pandas DataFrame (see below). If your file is simply one sheet with data you can directly copy and paste the url into your code and just replace one part by a set expression.

How do I import data from Google Sheets to Python?

A Google Account (and Google Sheet containing your data-of-interest). First, you’ll need to enable the Google Sheets API on your Gmail account, where the Google Sheet is stored. Login to your Gmail account and visit the Google Sheets API QuickStart Guide for Python. You’ll see a blue “Enable Google Sheets API” button.

How do I pull data only from a specific range in pandas?

DATA_TO_PULL = 'spreadsheet_tab_name' or #Pulls data only from the specified range of cells. Finally, we’ll bring all of our code together by specifying the pull_sheet_data () parameters, running the functions, and then storing the retrieved data into a Pandas DataFrame.


4 Answers

Seems to work for me without the StringIO:

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

like image 161
Max Ghenis Avatar answered Oct 13 '22 06:10

Max Ghenis


You can use read_csv() on a StringIO object:

from io import BytesIO

import requests
import pandas as pd

r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
    
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4  
like image 44
TomAugspurger Avatar answered Oct 13 '22 06:10

TomAugspurger


Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

like image 19
Ken Arnold Avatar answered Oct 13 '22 06:10

Ken Arnold


My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())
like image 11
Abhery Guha Avatar answered Oct 13 '22 05:10

Abhery Guha