Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getURL (from RCurl package) doesn't work in a loop

Tags:

r

rcurl

I have a list of URL named URLlist and I loop over it to get the source code for each of those URL :

for (k in 1:length(URLlist)){
    temp = getURL(URLlist[k])
}

Problem is for some random URL, the code get stuck and I get the error message:

Error in function (type, msg, asError = TRUE)  : 
    transfer closed with outstanding read data remaining

But when I try the getURL function, not in the loop, with the URL which had a problem, it perfectly works.

Any help please ? thank you very much

like image 967
user2187202 Avatar asked Oct 21 '22 10:10

user2187202


1 Answers

Hard to tell for sure without more information, but it could just be the requests getting sent too quickly, in which case just pausing between requests could help :

for (k in 1:length (URLlist)) {
    temp = getURL (URLlist[k])
    Sys.sleep (0.2) 
}

I'm assuming that your actual code does something with 'temp' before writing over it in every iteration of the loop, and whatever it does is very fast.

You could also try building in some error handling so that one problem doesn't kill the whole thing. Here's a crude example that tries twice on each URL before giving up:

for (url in URLlist) {
    temp = try (getURL (url))
    if (class (temp) == "try-error") {
        temp = try (getURL (url))
        if (class (temp) == "try-error")
            temp = paste ("error accessing", url)
        }    
    Sys.sleep(0.2) 
}
like image 158
janattack Avatar answered Oct 24 '22 09:10

janattack