Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Timeout while reading csv file from url in R

Tags:

r

csv

timeout

I currently have a script in R that loops around 2000 times (for loop), and on each loop it queries data from a database using a url link and the read.csv function to put the data into a variable.

My problem is: when I query low amounts of data (around 10000 rows) it takes around 12 seconds per loop and its fine. But now I need to query around 50000 rows of data per loop and the query time increases quite a lot, to 50 seconds or so per loop. And this is fine for me but sometimes I notice it takes longer for the server to send the data (≈75-90 seconds) and APPARENTLY the connection times out and I get these errors:

Error in file(file, "rt") : cannot open the connection

In addition: Warning message:

In file(file, "rt") : cannot open: HTTP status was '0 (nil)'

or this one:

Error in file(file, "rt") : cannot open the connection

In addition: Warning message:

In file(file, "rt") : InternetOpenUrl failed: 'The operation timed out'

I don't get the same warning every time, it changes between those two.

Now, what I want is to avoid my program to stop when this happens, or to simply prevent this timeout error and tell R to wait more time for the data. I have tried these settings at the start of my script as a possible solution but it keeps happening.

options(timeout=190)
setInternet2(use=NA)
setInternet2(use=FALSE)
setInternet2(use=NA)

Any other suggestions or workarounds? Maybe to skip to the next loop when this happens and store in a variable the loop number of the times this error occurred so it can be queried again in the end but only for those i's in the loop that were skipped due to the connection error? The ideal solution would be, of course, to avoid having this error.

like image 603
ANieder Avatar asked Oct 09 '13 08:10

ANieder


People also ask

How do I read a CSV file from URL in R?

In order to read CSV content from a URL into DataFrame use the R base function read. csv() . Following is the syntax of the read. csv() function in R.

How do I import a dataset into an R URL?

To import data from a web site, first obtain the URL of the data file. Click on the “Import Dataset” tab in Rstudio and paste the URL into the dialog box. Then click “OK”. After you hit “OK” you will get another dialog box.

What does the read CSV () function in R do?

read. csv() is a wrapper function for read. table() that mandates a comma as separator and uses the input file's first line as header that specifies the table's column names. Thus, it is an ideal candidate to read CSV files.


2 Answers

A solution using the RCurl package:

You can change the timeout option using

curlSetOpt(timeout = 200)

or by passing it into the call to getURL

getURL(url_vect[i], timeout = 200)

A solution using base R:

Simply download each file using download.file, and then worry about manipulating those file later.

like image 139
Richie Cotton Avatar answered Nov 14 '22 22:11

Richie Cotton


I see this is an older post, but it still comes up early in the list of Google results, so...

If you are downloading via WinInet (rather than curl, internal, wget, etc.) options, including timeout, are inherited from the system. Thus, you cannot set the timeout in R. You must change the Internet Explorer settings. See Microsoft references for details: https://support.microsoft.com/en-us/kb/181050 https://support.microsoft.com/en-us/kb/193625

like image 28
acarter Avatar answered Nov 14 '22 22:11

acarter