I currently have a script in R that loops around 2000 times (for
loop), and on each loop it queries data from a database using a url link and the read.csv
function to put the data into a variable.
My problem is: when I query low amounts of data (around 10000 rows) it takes around 12 seconds per loop and its fine. But now I need to query around 50000 rows of data per loop and the query time increases quite a lot, to 50 seconds or so per loop. And this is fine for me but sometimes I notice it takes longer for the server to send the data (≈75-90 seconds) and APPARENTLY the connection times out and I get these errors:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open: HTTP status was '0 (nil)'
or this one:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : InternetOpenUrl failed: 'The operation timed out'
I don't get the same warning every time, it changes between those two.
Now, what I want is to avoid my program to stop when this happens, or to simply prevent this timeout error and tell R to wait more time for the data. I have tried these settings at the start of my script as a possible solution but it keeps happening.
options(timeout=190)
setInternet2(use=NA)
setInternet2(use=FALSE)
setInternet2(use=NA)
Any other suggestions or workarounds? Maybe to skip to the next loop when this happens and store in a variable the loop number of the times this error occurred so it can be queried again in the end but only for those i
's in the loop that were skipped due to the connection error? The ideal solution would be, of course, to avoid having this error.
In order to read CSV content from a URL into DataFrame use the R base function read. csv() . Following is the syntax of the read. csv() function in R.
To import data from a web site, first obtain the URL of the data file. Click on the “Import Dataset” tab in Rstudio and paste the URL into the dialog box. Then click “OK”. After you hit “OK” you will get another dialog box.
read. csv() is a wrapper function for read. table() that mandates a comma as separator and uses the input file's first line as header that specifies the table's column names. Thus, it is an ideal candidate to read CSV files.
A solution using the RCurl
package:
You can change the timeout
option using
curlSetOpt(timeout = 200)
or by passing it into the call to getURL
getURL(url_vect[i], timeout = 200)
A solution using base R:
Simply download each file using download.file
, and then worry about manipulating those file later.
I see this is an older post, but it still comes up early in the list of Google results, so...
If you are downloading via WinInet (rather than curl, internal, wget, etc.) options, including timeout, are inherited from the system. Thus, you cannot set the timeout in R. You must change the Internet Explorer settings. See Microsoft references for details: https://support.microsoft.com/en-us/kb/181050 https://support.microsoft.com/en-us/kb/193625
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With