I have 20 rasters with same resolution and extent. It's a time series and each raster is for one year.
And I want to calculate the pixel-wise standard deviation of all rasters.So far, I am using the raster package.
qq2<-list(maxras1,maxras2,maxras3,maxras4,maxras5,maxras6,maxras7,maxras8,maxras9,maxras10)
qq2stack<-stack(qq2)
qq2mean<-mean(qq2stack)
qq2sd<-sd(qq2stack)
The mean works. But standard deviation is giving me this error:
Error in as.double(x) :
cannot coerce type 'S4' to vector of type 'double'
Each pixel correspond to any one value. In an 8-bit gray scale image, the value of the pixel between 0 and 255. The value of a pixel at any point correspond to the intensity of the light photons striking at that point. Each pixel store a value proportional to the light intensity at that particular location.
mean: simply divide the sum of pixel values by the total count - number of pixels in the dataset computed as len(df) * image_size * image_size. standard deviation: use the following equation: total_std = sqrt(psum_sq / count - total_mean ** 2)
Standard deviation measure the deviation of measured Values or the data from its mean. When Standard deviation is near zero, the measured values are near the mean and all converging. But when Standard deviation is high, the Values or the data are scattered and at the same time far from the mean.
Unfortunately, as you note after my comment above, per-pixel analyses can be slow. I think the next thing for you to try is to parallelize the process. Assuming you have a multi-core processor you can take advantage of calc()
and its built-in multi-process optimization:
cores <- 4
beginCluster(cores, type='SOCK')
calc(qq2stack, fun=sd)
endCluster()
This would result in a significant speed-up if your operating/hardware environment supports it. Obviously, you can increase the number of processes as well depending on your architecture.
To address the slow performance of raster::calc
for calculating the standard deviation of large rasters, I wrote a gist (here) that does it using GDAL's gdal_calc.py.
To calculate the sd of 5 rasters with 4 million cells (2000 x 2000), gdal_calc takes about 2 seconds, compared to about 60 seconds for raster::calc
(see example below).
You'll need gdal_calc.py on the system path so that it can be found by Sys.which
, or else modify gdal_calc <- Sys.which('gdal_calc.py')
to specify the full path to gdal_calc.py.
Note that, from memory, this function supports a maximum of 26 input rasters, since gdal_calc refers to each by a letter of the alphabet. Perhaps this constraint can be circumvented - I'm not familiar enough with gdalnumeric syntax to work it out.
E.g.:
First, let's create a stack of five random, 2000 x 2000 rasters. We'll use raster::calc
on this stack, but we'll also write them out to temp files, because gdal_calc needs file paths:
s <- stack(replicate(5, raster(matrix(runif(4000000), 2000))))
ff <- replicate(5, tempfile(fileext='.tif')) # a vector of temp file paths
writeRaster(s, ff, bylayer=TRUE)
Here's the raster::calc
version:
system.time(sd_calc <- calc(s, sd))
## user system elapsed
## 79.83 0.08 80.00
and the gdal_sd
(i.e., using gdal_calc.py) version:
devtools::source_gist('61c8062938e05a4c6b92') # source the gdal_sd gist
system.time(gdal_sd(infile=ff, outfile=f <- tempfile(fileext='.tif')))
## Calculating standard deviation and writing to file17b0146640ac.tif
## user system elapsed
## 0.00 0.03 2.05
And a comparison of their values:
range(sd_calc[] - raster(f)[])
## [1] -1.110223e-16 1.110223e-16
Note that raster::calc
is likely to be faster for small rasters, but clearly gdal_sd
is a huge improvement for large rasters.
Documentation for the standard deviation function used by gdal_calc is given here. I've specified ddof=1
(i.e. delta degrees of freedom) such that results are consistent with those returned by R's sd
.
For posterity, here is the source of gdal_sd
as it was at the time of posting:
gdal_sd <- function(infile, outfile, quiet=TRUE) {
require(rgdal)
# infile: The multiband raster file (or a vector of paths to multiple raster
# files) for which to calculate cell standard deviations.
# outfile: Path to raster output file.
# quiet: Logical. Should gdal_calc.py output be silenced?
gdal_calc <- Sys.which('gdal_calc.py')
if(gdal_calc=='') stop('gdal_calc.py not found on system.')
if(file.exists(outfile)) stop('outfile already exists.')
nbands <- sapply(infile, function(x) nrow(attr(GDALinfo(x), 'df')))
if(length(infile) > 26 || nbands > 26) stop('Maximum number of inputs is 26.')
if(length(nbands) > 1 & any(nbands > 1))
warning('One or more rasters have multiple bands. First band used.')
if(length(infile)==1) {
inputs <- paste0('-', LETTERS[seq_len(nbands)], ' ', infile, ' --',
LETTERS[seq_len(nbands)], '_band ', seq_len(nbands), collapse=' ')
n <- nbands
} else {
inputs <- paste0('-', LETTERS[seq_along(nbands)], ' ', infile, ' --',
LETTERS[seq_along(nbands)], '_band 1', collapse=' ')
n <- length(infile)
}
message('Calculating standard deviation and writing to ', basename(outfile))
cmd <- 'python %s %s --outfile=%s --calc="std([%s], 0, ddof=1)"'
out <- system(
sprintf(cmd, gdal_calc, inputs, outfile,
paste0(LETTERS[seq_len(n)], collapse=',')),
show.output.on.console=!quiet, intern=TRUE
)
if(any(grepl('Error', out))) stop(out, call.=FALSE) else NULL
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With