Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve data from public Google Spreadsheet using gdata library?

I'm working in Python and trying to retrieve data from a public Google Spreadsheet (this one) but struggling a bit with the developer documentation.

I'd like to avoid client authentication if possible, as it's a public spreadsheet.

Here's my current code, with the gdata library:

client = gdata.spreadsheet.service.SpreadsheetsService()  
key = '0Atncguwd4yTedEx3Nzd2aUZyNmVmZGRHY3Nmb3I2ZXc'  
worksheets_feed = client.GetWorksheetsFeed(key)  

This fails on line 3 with BadStatusLine.

How can I read in the data from the spreadsheet?

like image 976
Richard Avatar asked Sep 26 '11 20:09

Richard


1 Answers

I want to start out by echoing your sentiment that the Documentation is really poor. But, here's what I've been able to figure out so far.

Published of Public

It is very important that your spreadsheet be "Published to The Web" as opposed to just being "Public on the web." The first is achieved by going to the "File -> Publish to The Web ..." menu item. The second is achieved by clicking the "Share" button in the upper left-hand corner of the spreadsheet.

I checked, and your spreadsheet with key = '0Atncguwd4yTedEx3Nzd2aUZyNmVmZGRHY3Nmb3I2ZXc' is only "Public on the web." I made a copy of it to play around with for my example code. My copy has a key = '0Aip8Kl9b7wdidFBzRGpEZkhoUlVPaEg2X0F2YWtwYkE' which you will see in my sample code later.

This "Public on the Web" vs. "Published on The Web" nonsense is obviously a point of common confusion. It is actually documented in a red box in the "Visibilities and Projections" sections of the main API documentation. However, it is really hard to read that document.

Visibility and Projections

As that same document says, there are projections other than "full." And in fact (undocumented), "full" doesn't seem to play nicely with a visibility of "public" which is also important to set when making unauthenticated calls.

You can kind of glean from the pydocs that many of the methods on the SpreadsheetsService object can take "visibility" and "projection" parameters. I know only of "public" and "private" visibilities. If you learn of any others, I'd like to know about them too. It seems that "public" is what you should use when making unauthenticated calls.

As for Projections, it is even more complicated. I know of "full", "basic", and "values" projections. I only got lucky and found the "values" projection by reading the source code to the excellent Tabletop javascript library. And, guess what, that's the secret missing ingredient to make things work.

Working Code

Here is some code you can use to query the worksheets from my copy of your spreadsheet.

#!/usr/bin/python
from gdata.spreadsheet.service import SpreadsheetsService

key = '0Aip8Kl9b7wdidFBzRGpEZkhoUlVPaEg2X0F2YWtwYkE'

client = SpreadsheetsService()
feed = client.GetWorksheetsFeed(key, visibility='public', projection='basic')

for sheet in feed.entry:
  print sheet.title.text

** Tips ** I find it really helpful when working with terribly documented python APIs to use the dir() method in a running python interpreter to find out more about the kind of information I can get from the python objects. In this case, it doesn't help too much because the abstraction above the XML and URL based API is pretty poor.

By the way, I'm sure you are going to want to start dealing with the actual data in the spreadsheet, so I'll go ahead and toss in one more pointer. The data for each row organized as a dictionary can be found using GetListFeed(key, sheet_key, visibility='public', projection='values').entry[0].custom

like image 136
Pepper Lebeck-Jobe Avatar answered Oct 23 '22 23:10

Pepper Lebeck-Jobe