Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

raster package taking all hard drive

Tags:

r

raster

I am processing a time series of rasters (modis ndvi imagery) to calculate average and st.deviation of the series. Each yearly series is composed of 23 ndvi.tif images, each of 508Mb, so total is a big 11Gb to process. Below is the script for one year. I have to repeat this for a number of years.

library(raster)
library("rgeos")
filesndvi <- list.files(, pattern="NDVI.tif",full.names=TRUE) 
filesetndvi10 <- stack(filesndvi)
names(filesetndvi10)
avgndvi10<-mean(filesetndvi10)
desviondvi10 <- filesetndvi10 - avgndvi10
sumdesvioc <-sum(desviondvi10^2)
varndvi10  <- sumdesvioc/nlayers(filesetndvi10)
sdndvi10  <- sqrt(varndvi10)
cvndvi10  <- sdndvi10/avgndvi10

The problem: the process writes accumulatively in the hard drive until it's full. Don't know where in the HD the process writes. Only way to clean the HD I've found is reboot. Tried rm, didn't work. Tried closing RStudio, didn't work. I'm using R 3.0.2 with RStudio 0.98.994 with Ubuntu 14.04 on a 4Gb RAM Asus UX31 with a 256Gb HD. Any thoughts to clean the HD after the calculation for each year without rebooting will be much welcome. Thanks

like image 377
user2942623 Avatar asked Aug 21 '14 12:08

user2942623


Video Answer


2 Answers

I struggle with the same, but have a few tricks that help. First off is get more memory. Ram and HD space are cheap and will have dramatic effects when dealing with large R objects such as rasters. Secondly, use removeTmpFiles() in the raster package. You can set it ti remove tmp files older than a certain number of hours. e.g. removeTmpFiles(0.5) will remove tmp files older than 30 minutes. Make sure you only set this for a time when the files will longer be called on. Thirdly, use something like the below snip of rasterOptions(). Be careful with setting memory chunk sizes; those will NOT work for your system, but you might find something more optimized than the defaults. Finally, use rm() and gc() to clean as you cook. Hope this helps, but if you find a better solution please let me know.

tmpdir_name <- paste(c(drive, ":/RASTER_TEMP/"), collapse='')
if(file.exists(tmpdir_name) == FALSE){
    dir.create(tmpdir_name)
}

rasterOptions(datatype = "FLT4S", 
    progress = "text", 
    tmpdir = tmpdir_name, 
    tmptime = 4, 
    timer = TRUE,
    tolerance = 0.5,
    chunksize = 1e+08,
    maxmemory = 1e+09)
like image 180
Mr.ecos Avatar answered Oct 22 '22 00:10

Mr.ecos


There are two other things to consider. First, make fewer intermediate files by combining steps in calc or overlay functions (not too much scope for that here, but there is some), This can also speed up computations as there will be less reading from and writing to disk. Second, take control of deleting specific files. In the calc and overlay functions you can provide filenames such that you can remove the files you no longer need. But you can also delete the temp files explicitly. It is of course good practice to first remove the objects that point to these files. Here is an example based on yours.

library(raster)
# example data
set.seed(0)
ndvi <- raster(nc=10, nr=10)
n1 <- setValues(ndvi, runif(100) * 2 - 1)
n2 <- setValues(ndvi, runif(100) * 2 - 1)
n3 <- setValues(ndvi, runif(100) * 2 - 1)
n4 <- setValues(ndvi, runif(100) * 2 - 1)
filesetndvi10 <- stack(n1, n2, n3, n4)

nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)
desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , filename='over_tmp.grd')
sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)

f <- filename(avgndvi10)
rm(avgndvi10, desviondvi10_2, sdndvi10)
file.remove(c(f, extension(f, '.gri')))
file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri'))

To find out where temp files are written to look at

rasterOptions()

or to get the path as a variable do:

dirname(rasterTmpFile()) 

To set it the path, use

rasterOptions(tmpdir='a path')
like image 24
Robert Hijmans Avatar answered Oct 21 '22 22:10

Robert Hijmans