Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.csv fails to read a CSV file from google docs

Tags:

r

google-docs

I wish to use read.csv to read a google doc spreadsheet.

I try using the following code:

data_url <- "http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv"
read.csv(data_url)

Which results in the following error:

Error in file(file, "rt") : cannot open the connection

I'm on windows 7. And the code was tried on R 2.12 and 2.13

I remember trying this a few months ago and it worked fine. Any suggestion what might be causing this or how to solve it?

Thanks.

like image 807
Tal Galili Avatar asked Dec 13 '22 13:12

Tal Galili


2 Answers

It might have something to do with the fact that Google is reporting a 302 temporarily moved response.

> download.file(data_url, "~/foo.csv", method = "wget")
--2011-04-29 18:01:01--  http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 74.125.230.132, 74.125.230.128, 74.125.230.130, ...
Connecting to spreadsheets0.google.com|74.125.230.132|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 18:01:01--  https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|74.125.230.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: `/home/gavin/foo.csv'

    [ <=>                                                                                                                                                   ] 41          --.-K/s   in 0s      

2011-04-29 18:01:02 (1.29 MB/s) - `/home/gavin/foo.csv' saved [41]

> read.csv("~/foo.csv")
  column1 column2
1       a       1
2       b       2
3      ds       3
4       d       4
5       f       5
6      ga       5

I'm not sure R's internal download code is capable of responding to such redirects:

> download.file(data_url, "~/foo.csv")
trying URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Error in download.file(data_url, "~/foo.csv") : 
  cannot open URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
like image 110
Gavin Simpson Avatar answered Jan 05 '23 10:01

Gavin Simpson


I ran into the same problem and eventually found a solution in a forum thread. Using my own public CSV file:

library(RCurl)
tt = getForm("https://spreadsheets.google.com/spreadsheet/pub", 
          hl ="en_US", key = "0Aonsf4v9iDjGdHRaWWRFbXdQN1ZvbGx0LWVCeVd0T1E", 
          output = "csv", 
         .opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE)) 

holidays <- read.csv(textConnection(tt))
like image 44
John Horton Avatar answered Jan 05 '23 10:01

John Horton