the code below works fine in interactive mode but fails when used in a function. it's pretty simply two authentications POST
commands followed by the data download. my goal is to get this working inside a function, not just in interactive mode.
this question is sort of a sequel to this question.. icpsr recently updated their website. the minimal reproducible example below requires a free account, available at
https://www.icpsr.umich.edu/rpxlogin?path=ICPSR&request_uri=https%3a%2f%2fwww.icpsr.umich.edu%2ficpsrweb%2findex.jsp
i tried adding Sys.sleep(1)
and various httr::GET
/httr::POST
calls but nothing worked.
my_download <-
function( your_email , your_password ){
values <-
list(
agree = "yes",
path = "ICPSR" ,
study = "21600" ,
ds = "" ,
bundle = "rdata",
dups = "yes",
email=your_email,
password=your_password
)
httr::POST("https://www.icpsr.umich.edu/cgi-bin/terms", body = values)
httr::POST("https://www.icpsr.umich.edu/rpxlogin", body = values)
tf <- tempfile()
httr::GET(
"https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2" ,
query = values ,
httr::write_disk( tf , overwrite = TRUE ) ,
httr::progress()
)
}
# fails
my_download( "[email protected]" , "some_password" )
# stepping through works
debug( my_download )
my_download( "[email protected]" , "some_password" )
EDIT the failure simply downloads this page as if not logged in (and not the dataset), so it's losing the authentication for some reason. if you are logged in to icpsr, use private browsing to see the page--
https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?study=21600&ds=1&bundle=rdata&path=ICPSR
thanks!
This sort of thing can happen because the state (such as cookies) the httr
package stores in the handle
for each URL (see ?handle
).
In this particular case it remains unclear what exactly make it work, but one strategy is to include a GET
request to https://www.icpsr.umich.edu/cgi-bin/bob/
prior to authenticating and requesting the data. For example,
my_download <-
function( your_email , your_password ){
## for some reason this is required ...
httr::GET("https://www.icpsr.umich.edu/cgi-bin/bob/")
values <-
list(
agree = "yes",
path = "ICPSR" ,
study = "21600" ,
ds = "" ,
bundle = "rdata",
dups = "yes",
email=your_email,
password=your_password
)
httr::POST("https://www.icpsr.umich.edu/rpxlogin", body = values)
httr::POST("https://www.icpsr.umich.edu/cgi-bin/terms", body = values)
tf <- tempfile()
httr::GET(
"https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2" ,
query = values ,
httr::write_disk( tf , overwrite = TRUE ) ,
httr::progress()
)
}
appears to work correctly, though it remains unclear what the GET
request to https://www.icpsr.umich.edu/cgi-bin/bob/` does exactly or why it is needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With