Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reduce memory usage for mosaic on large list of rasters

Tags:

r

r-raster

I am using the mosaic function in the raster package to combine a long (11,000 files) list of rasters using the approach suggested by @RobertH here.

rlist <- sapply(list_names)
rlist$fun <- mean
rlist$na.rm <- TRUE
x <- do.call(mosaic, rlist)

As you might imagine, this eventually overruns my available memory (on several different machines and computing clusters). My question is: Is there a way to reduce the memory usage of either mosaic or do.call? I've tried altering maxmemory in rasterOptions(), but that does not seem to help. Processing the rasters in smaller batches seems problematic because the rasters may be spatially disjunct (i.e., sequential raster files may be located very far from each other). Thanks in advance for any help you can give.

like image 754
Matt Williamson Avatar asked Sep 12 '25 01:09

Matt Williamson


1 Answers

Rather than loading all rasters into memory at once (in the mosaic() call), can you process them one at a time? That way, you have your mosaic that updates each time you bring one more raster into memory, but then you can get rid of the new raster and just keep the continuously updating mosaic raster.

Assuming that your rlist object is a list of rasters, I'm thinking of something like:

Pseudocode

  1. Initialize an updating_raster object as the first raster in the list
  2. Loop through each raster in the list in turn, starting from the 2nd raster
  3. Read the ith raster into memory called next_raster
  4. Update the updating_raster object by overwriting it with the mosaic of itself and the next raster using a weighted mean

R code

Testing with the code in the mosaic() help file example...

First generate some rasters and use the standard mosaic method.

library(raster)

r <- raster(ncol=100, nrow=100)
r1 <- crop(r, extent(-10, 11, -10, 11))
r2 <- crop(r, extent(0, 20, 0, 20))
r3 <- crop(r, extent(9, 30, 9, 30))

r1[] <- 1:ncell(r1)
r2[] <- 1:ncell(r2)
r3[] <- 1:ncell(r3)

m1 <- mosaic(r1, r2, r3, fun=mean)

Put the rasters in a list so they are in a similar format as I think you have.

rlist <- list(r1, r2, r3)

Because of the NA handling of the weighted.mean() function, I opted to create the same effect by breaking down the summation and the division into distinct steps...

First initialize the summation raster:

updating_sum_raster <- rlist[[1]]

Then initialize the "counter" raster. This will represent the number of rasters that went into mosaicking at each pixel. It starts as a 1 in all cells that aren't NA. It should properly handle NAs such that it only will increment for a given pixel if a non-NA value was added to the updating sum.

updating_counter_raster <- updating_sum_raster
updating_counter_raster[!is.na(updating_counter_raster)] <- 1

Here's the loop that doesn't require all rasters to be in memory at once. The counter raster for the raster being added to the mosaic has a value of 1 only in the cells that aren't NA. The counter is updated by summing the current counter raster and the updating counter raster. The total sum is updated by summing the current raster values and the updating raster values.

for (i in 2:length(rlist)) {

  next_sum_raster <- rlist[[i]]
  next_counter_raster <- next_sum_raster
  next_counter_raster[!is.na(next_counter_raster)] <- 1

  updating_sum_raster <- mosaic(x = updating_sum_raster, y = next_sum_raster, fun = sum)
  updating_counter_raster <- mosaic(updating_counter_raster, next_counter_raster, fun = sum)

}

m2 <- updating_sum_raster / updating_counter_raster

The values here seem to match the use of the mosaic() function

identical(values(m1), values(m2))
> TRUE

But the rasters themselves aren't identical:

identical(m1, m2)
> FALSE

Not totally sure why, but maybe this gets you closer?

Perhaps compareRaster() is a better way to check:

compareRaster(m1, m2)
> TRUE

Hooray!

Here's a plot!

plot(m1)
text(m1, digits = 2)
plot(m2)
text(m2, digits = 2)

comparison between mosaic function and loop approach

A bit more digging in the weeds...

From the mosaic.R file:

It looks like the mosaic() function initializes a matrix called v to populate with the values from all the cells in all the rasters in the list. The number of rows in matrix v is the number of cells in the output raster (based on the full mosaicked extent and resolution), and the number of columns is the number of rasters to be mosaicked (11,000) in your case. Maybe you're running into the limits of matrix creation in R?

With a 1000 x 1000 raster (1e6 pixels), the v matrix of NAs takes up 41 GB. How big do you expect your final mosaicked raster to be?

r <- raster(ncol=1e3, nrow=1e3)
x <- 11000
v <- matrix(NA, nrow=ncell(r), ncol=x)
format(object.size(v), units = "GB")
[1] "41 Gb"
like image 163
mikoontz Avatar answered Sep 14 '25 14:09

mikoontz