Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you save a Google Sheets file as CSV from Python 3 (or 2)?

I am looking for a simple way to save a csv file originating from a published Google Sheets document? Since it's published, it's accessible through a direct link (modified on purpose in the example below).

All my browsers will prompt me to save the csv file as soon as I launch the link.

Neither:

DOC_URL = 'https://docs.google.com/spreadsheet/ccc?key=0AoOWveO-dNo5dFNrWThhYmdYW9UT1lQQkE&output=csv'    

f = urllib.request.urlopen(DOC_URL)
cont = f.read(SIZE)
f.close()
cont = str(cont, 'utf-8')
print(cont)

, nor:

req = urllib.request.Request(DOC_URL)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13')
f = urllib.request.urlopen(req)
print(f.read().decode('utf-8'))

print anything but html content.

(Tried the 2nd version after reading this other post: Download google docs public spreadsheet to csv with python .)

Any idea on what I am doing wrong? I am logged out of my Google account, if that worths to anything, but this works from any browser that I tried. As far as I understood, the Google Docs API is not yet ported on Python 3 and given the "toy" magnitude of my little project for personal use, it would not even make too much sense to use it from the get-go, if I can circumvent it.

In the 2nd attempt, I left the 'User-Agent', as I was thinking that maybe requests thought as coming from scripts (b/c no identification info is present) might be ignored, but it didn't make a difference.

like image 512
elder elder Avatar asked Feb 28 '13 15:02

elder elder


People also ask

How do I export to Google Sheets from Python?

Search for 'Google Drive API', enable it. Select Compute Engine service default, JSON, hit create. Open up the JSON file, share your spreadsheet with the "[email protected]" email listed. Save the JSON file wherever you're hosting your project, you'll need to load it in through Python later.

Can you do CSV in Google Sheets?

CSV files are used in the Back Office for exporting reports, importing-exporting items, and clients. For opening this type of files, you can use Google Sheets. Go to Google Sheets and click on the button 'Start a new spreadsheet'. Go to 'File' menu and click on 'Import' button.


2 Answers

While the requests library is the gold standard for HTTP requests from Python, this style of download is (while not deprecated yet) not likely to last, specifically referring to the use of links, managing cookies & redirects, etc. One of the reasons for not preferring links is that it's less secure and generally such access should require authorization. Instead, the currently accepted way of exporting Google Sheets as CSV is by using the Google Drive API.

So why the Drive API? Isn't this supposed to be something for the Sheets API instead? Well, the Sheets API is for spreadsheet-oriented functionality, i.e., data formatting, column resize, creating charts, cell validation, etc., while the Drive API is for file-oriented functionality, i.e., import/export, copy, rename, etc.

Below is a complete cmd-line solution. (If you don't do Python, you can use it as pseudocode and pick any language supported by the Google APIs Client Libraries.) For the code snippet, assume the most current Sheet named inventory (older files with that name are ignored) and DRIVE is the API service endpoint:

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

# query for latest file named FILENAME
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

# if found, export Sheets file as CSV
if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()

    # if non-empty file
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')

If your Sheet is large, you may have to export it in chunks -- see this page on how to do that. If you're generally new to Google APIs, I have a (somewhat dated but) user-friendly intro video for you. (There are 2 videos after that which maybe useful too.)

like image 64
wescpy Avatar answered Sep 18 '22 20:09

wescpy


Google responds to the initial request with a series of cookie-setting 302 redirects. If you don't store and resubmit the cookies between requests, it redirects you to the login page.

So, the problem is not with the User-Agent header, it's the fact that by default, urllib.request.urlopen doesn't store cookies, but it will follow the HTTP 302 redirects.

The following code works just fine on a public spreadsheet available at the location specified by DOC_URL:

>>> from http.cookiejar import CookieJar
>>> from urllib.request import build_opener, HTTPCookieProcessor
>>> opener = build_opener(HTTPCookieProcessor(CookieJar()))
>>> resp = opener.open(DOC_URL)
>>> # should really parse resp.getheader('content-type') for encoding.
>>> csv_content = resp.read().decode('utf-8')

Having shown you how to do it in vanilla python, I'll now say that the Right Way™ to go about this is to use the most-excellent requests library. It is extremely well documented and makes these sorts of tasks incredibly pleasant to complete.

For instance, to get the same csv_content as above using the requests library is as simple as:

>>> import requests
>>> csv_content = requests.get(DOC_URL).text

That single line expresses your intent more clearly. It's easier to write and easier to read. Do yourself - and anyone else who shares your codebase - a favor and just use requests.

like image 33
Don Spaulding Avatar answered Sep 17 '22 20:09

Don Spaulding