Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Authenticate to Google Drive and download spreadsheet with Python urllib2/requests

I would like to download a document I have in my Google Drive authenticating to Google (I only want certain users to be able to access it and do not want to publish it on the web).

I have tried using requests but apparently I am doing something wrong. From a browser I can download my document going to the address https://docs.google.com/spreadsheets/d/<document key>/export?format=xls.

So in my python script I do the following:

import os
import requests
import shutil
from requests.auth import HTTPBasicAuth

remote = "https://docs.google.com/spreadsheets/d/<document key>/export?format=xls"
username = os.environ['GOOGLEUSERNAME']
password = os.environ['GOOGLEPASSWORD']
r = requests.get(remote, auth=HTTPBasicAuth(username,password))
if r.status_code == 200:
  with open("document.xls","wb") as f:
    shutil.copyfileobj(r.raw, f)

however the resulting document.xls is empty.

What am I doing wrong?

like image 285
lucacerone Avatar asked Mar 17 '23 06:03

lucacerone


2 Answers

It might actually be possible what you are trying to do, but here are some reasons why it will be non-trivial(by no means a complete list):

  1. Google is usually blocking user-agents that are non-browsers(like your Python script) for browser intended content (for security reasons); you would have to spoof it, which is actually easy
  2. Multi-factor authentication - you would have to turn that off (easy, but you open yourself up for being hacked...)
  3. Session-cookie - aka security cookie; (not so easy to get ahold of)

What you should do instead

Use the official google-drive API. Also, the Python client library has a nice tutorial and this page describes how to download files from google-drive.

If you want to write even less code, then libraries like PyDrive will make your live even easier.

Hope this helps!

like image 176
Marco Pashkov Avatar answered Apr 08 '23 19:04

Marco Pashkov


I might have a simple solution for you, depending on what exactly the auth requirements are. You are saying

I only want certain users to be able to access it and do not want to publish it on the web

From this statement alone, it may be sufficient for you to create a "secret" link for your document, and share this among your users. You can then easily retrieve this document automatically, for instance with wget, and specify the format, e.g. csv:

wget -O data.csv "https://docs.google.com/spreadsheets/d/***SHARED-SECRET***/export?format=csv"

Or, in Python (2):

import urllib2
from cookielib import CookieJar

spreadsheet_url = "https://docs.google.com/spreadsheets/d/***SHARED-SECRET***/export?format=csv"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(CookieJar()))
response = opener.open(spreadsheet_url) 
with open("data.csv", "wb") as f:
    f.write(response.read())

I am actually using that in production, it works reliably, without faking the user agent.

like image 24
Dr. Jan-Philip Gehrcke Avatar answered Apr 08 '23 19:04

Dr. Jan-Philip Gehrcke