Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rasterOptions: Difference between chunksize and maxmemory

Tags:

r

raster

r-raster

I recently stumbled upon the two possible rasterOptions which can improve the performance of raster operations in R: chunksize and maxmemory. I am however confused what the difference is. The help page states:

  • chunksize: Maximum number of cells to read/write in a single chunk while processing (chunk by chunk) disk based Raster* objects.

  • maxmemory: Maximum number of cells to read into memory. I.e., if a Raster* object has more than this number of cells, canProcessInMemory will return FALSE.

In my understanding they are both the same, atleast I can't figure out by the definitions what the difference is and also how they influence each other. I.e. low chunksize combined with a high maxmemory value?

like image 703
maRtin Avatar asked Feb 06 '23 13:02

maRtin


1 Answers

These options are helper programs in the raster package, and often won't need to be called unless you are writing a user defined raster writing function.

If your raster can not be read into R, i.e., in the event that canProcessInMemory has returned FALSE, you need to read the raster in chunk by chunk. If you do this, you supply the size of the chunk, determined by an integer value of complete rows which will be read one at a time (or in parallel).

How many rows should you read in a single chunk? blockSize() helps you determine this.

r <- raster(system.file("external/test.grd", package="raster"))
blockSize(r)

Combined with writeValues(), you can manually write the values of a raster object chunk by chunk into an object of class RasterBrick, which is faster, or an object of class RasterLayer, which is more flexible.

The default value reads a maximum of 1e8 cells, which result in different allocations of memory depending on how many bits the raster cells are. If you have significantly amounts of memory, you can yield nice performance gains by increasing the maximum amount of memory, with increasing returns to having more memory.

Increasing the chunk size is not so valuable, as there are decreasing performance returns to more chunk size. You will gain some performance increases by increasing chunk size, but it is not as important.

While there is marginal gain to increasing the chunk size, doing so along with increasing the max memory size can be a bad idea, as you are forcing the entire raster into memory by doing this in a single calculation, which might trigger canProcessInMemory to fail, which will stop the processing of the raster, close the connection, and spit out some temp files.

A good rule of thumb is to decrease the size of chunks just to avoid any problems (up to a limit -- maybe 1e5, you likely won't ever run into any problems) and sacrifice a bit of performance, but to increase the max memory as much as feasible (1e9 or so, depending on how much RAM your rig has).

Finally, there is a nice vignette about writing custom functions for raster objects that are too large to fit in memory.

like image 121
shayaa Avatar answered Feb 23 '23 14:02

shayaa