How do you save a Google Sheets file as CSV from Python 3 (or 2)?

Tags:

I am looking for a simple way to save a csv file originating from a published Google Sheets document? Since it's published, it's accessible through a direct link (modified on purpose in the example below).

All my browsers will prompt me to save the csv file as soon as I launch the link.

Neither:

DOC_URL = 'https://docs.google.com/spreadsheet/ccc?key=0AoOWveO-dNo5dFNrWThhYmdYW9UT1lQQkE&output=csv'    

f = urllib.request.urlopen(DOC_URL)
cont = f.read(SIZE)
f.close()
cont = str(cont, 'utf-8')
print(cont)

, nor:

req = urllib.request.Request(DOC_URL)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13')
f = urllib.request.urlopen(req)
print(f.read().decode('utf-8'))

print anything but html content.

(Tried the 2nd version after reading this other post: Download google docs public spreadsheet to csv with python .)

Any idea on what I am doing wrong? I am logged out of my Google account, if that worths to anything, but this works from any browser that I tried. As far as I understood, the Google Docs API is not yet ported on Python 3 and given the "toy" magnitude of my little project for personal use, it would not even make too much sense to use it from the get-go, if I can circumvent it.

In the 2nd attempt, I left the 'User-Agent', as I was thinking that maybe requests thought as coming from scripts (b/c no identification info is present) might be ignored, but it didn't make a difference.

512

asked Feb 28 '13 15:02

elder elder

2 Answers

While the requests library is the gold standard for HTTP requests from Python, this style of download is (while not deprecated yet) not likely to last, specifically referring to the use of links, managing cookies & redirects, etc. One of the reasons for not preferring links is that it's less secure and generally such access should require authorization. Instead, the currently accepted way of exporting Google Sheets as CSV is by using the Google Drive API.

So why the Drive API? Isn't this supposed to be something for the Sheets API instead? Well, the Sheets API is for spreadsheet-oriented functionality, i.e., data formatting, column resize, creating charts, cell validation, etc., while the Drive API is for file-oriented functionality, i.e., import/export, copy, rename, etc.

Below is a complete cmd-line solution. (If you don't do Python, you can use it as pseudocode and pick any language supported by the Google APIs Client Libraries.) For the code snippet, assume the most current Sheet named inventory (older files with that name are ignored) and DRIVE is the API service endpoint:

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

# query for latest file named FILENAME
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

# if found, export Sheets file as CSV
if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()

    # if non-empty file
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')

If your Sheet is large, you may have to export it in chunks -- see this page on how to do that. If you're generally new to Google APIs, I have a (somewhat dated but) user-friendly intro video for you. (There are 2 videos after that which maybe useful too.)

answered Sep 18 '22 20:09

wescpy

Google responds to the initial request with a series of cookie-setting 302 redirects. If you don't store and resubmit the cookies between requests, it redirects you to the login page.

So, the problem is not with the User-Agent header, it's the fact that by default, urllib.request.urlopen doesn't store cookies, but it will follow the HTTP 302 redirects.

The following code works just fine on a public spreadsheet available at the location specified by DOC_URL:

>>> from http.cookiejar import CookieJar
>>> from urllib.request import build_opener, HTTPCookieProcessor
>>> opener = build_opener(HTTPCookieProcessor(CookieJar()))
>>> resp = opener.open(DOC_URL)
>>> # should really parse resp.getheader('content-type') for encoding.
>>> csv_content = resp.read().decode('utf-8')

Having shown you how to do it in vanilla python, I'll now say that the Right Way™ to go about this is to use the most-excellent requests library. It is extremely well documented and makes these sorts of tasks incredibly pleasant to complete.

For instance, to get the same csv_content as above using the requests library is as simple as:

>>> import requests
>>> csv_content = requests.get(DOC_URL).text

That single line expresses your intent more clearly. It's easier to write and easier to read. Do yourself - and anyone else who shares your codebase - a favor and just use requests.

answered Sep 17 '22 20:09

Don Spaulding

Related questions
                            
                                pexpect setecho not working
                            
                                Getting indices of both zero and nonzero elements in array
                            
                                How to override constructor parameters in Sphinx with autodoc?
                            
                                pip install - Connection reset by peer
                            
                                what's the difference between gethostname and getfqdn?
                            
                                Add new points to existing Polygon in Shapely
                            
                                Python coding style on function call with multiple parameters
                            
                                Unrecognized or unsupported array type in function cvGetMat in python opencv
                            
                                Can I pass "\t" to python from the command line? [duplicate]
                            
                                Replacing selected elements in a list in Python
                            
                                Catch Keyboard Interrupt in program that is waiting on an Event
                            
                                NumPy min/max in-place assignment
                            
                                What is a good replacement for paramiko in python 3 ? Or is there a port of paramiko for python 3?
                            
                                link axis between different plot (no subplots) using matplotlib
                            
                                Python - Loading files relative from project root
                            
                                Python coding practice : Return None vs Return same datatype with empty value? [closed]
                            
                                Extracting information from Musicxml
                            
                                Mongoengine - How to perform a "save new item or increment counter" operation?
                            
                                Python multiprocessing Events vs Theading Events
                            
                                Cross product of sets using recursion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do you save a Google Sheets file as CSV from Python 3 (or 2)?

Tags:

python

csv

google-drive-api

google-api-python-client

google-sheets-api

elder elder

People also ask

2 Answers

wescpy

Don Spaulding

Recent Activity

Donate For Us