I have an double loop like the one shown below the problem is that R (2.15.2) is using more and more memory and I do not understand why.
While I understand that this has to happen within the inner cycle because of the rbind()
I am doing there, I do not understand why R keeps grabbing memory when a new cycle of the outer loop starts and actually the objects ( 'xmlCatcher' ) are reused:
# !!!BEWARE this example creates a lot of files (n=1000)!!!!
require(XML)
chunk <- function(x, chunksize){
# source: http://stackoverflow.com/a/3321659/1144966
x2 <- seq_along(x)
split(x, ceiling(x2/chunksize))
}
chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
for(i in 1:1000){
writeLines(c(paste('<?xml version="1.0"?>\n <note>\n <to>Tove</to>\n <nr>',i,'</nr>\n <from>Jani</from>\n <heading>Reminder</heading>\n ',sep=""), paste(rep('<body>Do not forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
,paste("test",i,".xml",sep=""))
}
for(k in 1:length(chunky)){
gc()
print(chunky[[k]])
xmlCatcher <- NULL
for(i in 1:length(chunky[[k]])){
filename <- chunky[[k]][i]
xml <- xmlTreeParse(filename)
xml <- xmlRoot(xml)
result <- sapply(getNodeSet(xml,"//body"), xmlValue)
id <- sapply(getNodeSet(xml,"//nr"), xmlValue)
dummy <- cbind(id,result)
xmlCatcher <- rbind(xmlCatcher,dummy)
}
save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
}
Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'xmlCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle.
Is this a bug or do I miss something?
Your understanding of reusing memory is wong.
When you create the new DummyCatcher, the old one is no longer referenced and then becomes candidate for garbage collection, which will happen at some point.
You are not reusing memory, you are creating a new object and abandon the old one.
Garbage collection will free the memory.
Also, i suggest you look at Rprofmem to profile your memory use.
The chpater 2 of this talk about the rbind
as a common|means of being a glutton.
You can avoid the use of rbind
inside the loop,
my.list <- vector('list', chunk[k])
for(i in 1:chunk[k]) {
dummy <- dummy + 1
my.list[[i]] <- data.frame(dummy)
}
DummyCatcher <- do.call('rbind', my.list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With