Using Python, how can I read plain text from a Google Doc?

Question

I am attempting to read the raw text/content of a Google Doc (just a plain document, not a spreadsheet or presentation) from within a Python script, but so far have had little success.

Here's what I've tried:

import gdata.docs.service
client = gdata.docs.service.DocsService()
client.ClientLogin('email', 'password')
q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('email', 'Folder Name')
feed = client.Query(q.ToUri())
doc = feed.entry[0] # extract one of the documents

However, this variable doc, which is of type gdata.docs.DocumentListEntry, doesn't seem to contain any content, just meta information about the document.

Am I doing something wrong here? Can somebody point me in the right direction? Thank you!

wescpy · Accepted Answer

UPDATE (Mar 2019) Good news! The Google Docs REST API is now available. More info about it from my SO answer to a similar question, but to get you going, here's the official Python "quickstart" sample showing you how to get the title of a Google Doc in plain text.

Both the Apps Script and Drive REST API solutions originally answered below are still valid and are alternate ways to get the contents of a Google Doc. (The Drive API works on both Python 2 & 3, but Apps Script is JavaScript-only.)

Bottom-line: if you want to download the entire Doc in plain text, the Drive API solution is best. If you want to programmatically CRUD different parts of a Doc, then you must use either the Docs API or Apps Script.

(Feb 2017) The code in the OP and the only other answer are both now out-of-date as ClientLogin authentication was deprecated back in 2012(!), and GData APIs are the previous generation of Google APIs. While not all GData APIs have been deprecated, all newer Google APIs do not use the Google Data protocol.

There isn't a REST API available (at this time) for Google Docs documents, although there is an "API-like" service provided by Google Apps Script, the JavaScript-in-the-cloud solution which provides programmatic access to Google Docs (via its DocumentService object), including Docs add-ons.

To read plain text from a Google Doc, considered file-level access, you would use the Google Drive API instead. Examples of using the Drive API:

Exporting a Google Sheet as CSV (blog post)
"Poor man's plain text to PDF" converter (blog post) (*)

(*) - TL;DR: upload plain text file to Drive, import/convert to Google Docs format, then export that Doc as PDF. Post above uses Drive API v2; this follow-up post describes migrating it to Drive API v3, and here's a developer video combining both "poor man's converter" posts.

The solution to the OP is to perform similar operations as what you see in both posts above but ensure you're using the text/plain export MIMEtype. For other import/export formats to/from Drive, see this related question SO answer as well as the downloading files from Drive docs page. Here's some pseudocode that searches for Google Docs documents called "Hello World" in my Drive folder and displays the contents of the first matching file found on-screen (assuming DRIVE is your API service endpoint):

from __future__ import print_function

NAME = 'Hello World'
MIME = 'text/plain'

# using Drive API v3; if using v2, change 'pageSize' to 'maxResults',
# 'name=' to 'title=', and ".get('files')" to ".get('items')"
res = DRIVE.files().list(q="name='%s'" % NAME, pageSize=1).execute().get('files')
if res:
    fileID = res[0]['id']  # 1st matching "Hello World" name
    res = DRIVE.files().export(fileId=fileID, mimeType=MIME).execute()
    if res:
        print(res.decode('utf-8')) # decode bytes for Py3; NOP for Py2

If you need more than this, see these videos on how to setup using Google APIs, OAuth2 authorization, and creating a Drive service endpoint to list your Drive files, plus a corresponding blog post for all three.

To learn more about how to use Google APIs with Python in general, check out my blog as well as a variety of Google developer videos (series 1 and series 2) I'm producing.

Using Python, how can I read plain text from a Google Doc?

Tags:

python

download

google-docs

google-docs-api

user2046358

1 Answers

wescpy

Recent Activity

Donate For Us

Using Python, how can I read plain text from a Google Doc?

Tags:

python

download

google-docs

google-docs-api

user2046358

1 Answers

wescpy

Related questions

Recent Activity

Donate For Us