Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

zip file error in reading in an https url

Tags:

r

I'm attempting to learn how to read in an access/zip file that has an https url into R. This is part of a larger mapping learning project I'm undertaking to branch out my R skills found HERE (I will link this post back there as well).

This was the plan but I get an error from the getURL and I'm not sure why:

require(RCurl)
NYSdemo <- getURL("https://reportcards.nysed.gov/zip/SRC2010.zip")
temp <- tempfile()
download.file(NYSdemo, temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)

ERROR:

> NYSdemo <- getURL("https://reportcards.nysed.gov/zip/SRC2010.zip")
Error in function (type, msg, asError = TRUE)  : 
  SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Like I said this is a learning project so many of the techniques I'm using here I am not at all familiar with.

The actual zip file I'm trying to download is HERE

Maybe this isn't actually a programming problem but something wrong with the URL that doesn't enable getURL to be used on it.

Thank you in advance for your ideas and help.

EDIT: I attempted the ssl.verifypeer but get another error

> NYSdemo <- getURL("https://reportcards.nysed.gov/zip/SRC2010.zip",
+ ssl.verifypeer = FALSE)
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  embedded nul in string: 'PK\003\004\024\0\0\0\b\0i[j>¶U#]tó\036\005\0 ÷- {And it continues}
> 

EDIT 2: Per Vincent's Suggestions

> NYSdemo <- getURL("http://reportcards.nysed.gov/zip/SRC2010.zip")
> download.file(NYSdemo, temp)
Error in download.file(NYSdemo, temp) : unsupported URL scheme
> 
> NYSdemo <- getBinaryURL("https://reportcards.nysed.gov/zip/SRC2010.zip")
Error in function (type, msg, asError = TRUE)  : 
  SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
> 
> url.exists("https://reportcards.nysed.gov/zip/SRC2010.zip")
[1] FALSE   #not sure why this is because it works to type into url bar of browser

This information is leading me to believe that the problem is something strange about the zip file. Ideas?

like image 517
Tyler Rinker Avatar asked Feb 16 '12 04:02

Tyler Rinker


1 Answers

While you didn't believe me over at TS, I tested my solution with the help of whoever it was that gave the ssl.verify idea.

bin <- getBinaryURL("https://reportcards.nysed.gov/zip/SRC2010.zip",
                    ssl.verifypeer=FALSE)
con <- file("schools.zip", open = "wb")
writeBin(bin, con)
close(con)

Since the file is big, it took me awhile to download the binary, but it wrote it real fast. Make sure to close the connection so you can open your ZIP file afterwards. I was able to open both the PDF and the Access database.

like image 65
Bryan Goodrich Avatar answered Nov 09 '22 12:11

Bryan Goodrich