Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying overlap zones in R raster package

Package:

  • raster

Data:

  • A rasterStack with 10 bands.
  • Each of the bands contains an image area surrounded by NAs
  • Bands are logical, i.e. "1" for image data and "0"/NA for surrounding area
  • The "image areas" of each band do not align completely with each other, though most have partial overlaps

Objective:

  • Write a fast function that can return either a rasterLayer or cell numbers for each "zone", for instance a pixel containing data only from bands 1 and 2 falls in zone 1, a pixel containing data only from bands 3 and 4 falls in zone 2, etc. If a rasterLayer is returned, I need to be able to match the zone value with band numbers later.

First attempt:

# Possible band combinations
values = integer(0)
for(i in 1:nlayers(myraster)){
 combs = combn(1:nlayers(myraster), i)
 for(j in 1:ncol(combs)){
  values = c(values, list(combs[,j]))
 }
}

# Define the zone finding function
find_zones = function(bands){

 # The intersection of the bands of interest
 a = subset(myraster, 1)
 values(a) = TRUE
 for(i in bands){
  a = a & myraster[[i]]
 }

 # Union of the remaining bands
 b = subset(myraster, 1)
 values(b) = FALSE
 for(i in seq(1:nlayers(myraster))[-bands]){
  b = b | myraster[[i]]
 }

 #plot(a & !b)
 cells = Which(a & !b, cells=TRUE)
 return(cells)
}

# Applying the function
results = lapply(values, find_zones)

My current function takes a very long time to execute. Can you think of a better way? Note that I don't simply want to know how many bands have data at each pixel, I also need to know which bands. The purpose of this is to process different the areas differently afterwards.

Note also that the real-life scenario is a 3000 x 3000 or more raster with potentially more than 10 bands.


EDIT

Some sample data consisting of 10 offset image areas:

# Sample data
library(raster)    
for(i in 1:10) {
  start_line = i*10*1000
  end_line = 1000000 - 800*1000 - start_line
  offset = i * 10
  data = c(rep(0,start_line), rep(c(rep(0,offset), rep(1,800), rep(0,200-offset)), 800), rep(0, end_line))
  current_layer = raster(nrows=1000, ncols=1000)
  values(current_layer) = data
  if(i == 1) {
    myraster = stack(current_layer)
  } else {
    myraster = addLayer(myraster, current_layer)
  }
}
NAvalue(myraster) = 0  # You may not want to do this depending on your solution...

Showing what the sample data looks like

like image 769
Benjamin Avatar asked Apr 14 '11 19:04

Benjamin


1 Answers

EDIT : Answer updated using Nick's trick and matrix multiplication.


You could try the following function, optimized by using Nick's trick and matrix multiplication. The bottleneck now is filling up stack with the seperate layers, but I guess the timings are quite OK now. Memory usage is a bit less, but given your data and the nature of R, I don't know if you can nibble of a bit without hampering the performance big time.

> system.time(T1 <- FindBands(myraster,return.stack=T))
   user  system elapsed 
   6.32    2.17    8.48 
> system.time(T2 <- FindBands(myraster,return.stack=F))
   user  system elapsed 
   1.58    0.02    1.59 
> system.time(results <- lapply(values, find_zones))
  Timing stopped at: 182.27 35.13 217.71

The function returns either a rasterStack with the different level combinations present in the plot (that's not all possible level combinations, so you have some gain there already), or a matrix with the level number and level names. This allows you to do something like :

levelnames <- attr(T2,"levels")[T2]

to get the level names for each cell point. As shown below, you can easily put that matrix inside a rasterLayer object.

The function :

 FindBands <- function(x,return.stack=F){
    dims <- dim(x)
    Values <- getValues(x)
    nn <- colnames(Values)

    vec <- 2^((1:dims[3])-1)
    #Get all combinations and the names
    id <- unlist(
                lapply(1:10,function(x) combn(1:10,x,simplify=F))
              ,recursive=F)

    nameid <- sapply(id,function(i){
      x <- sum(vec[i])
      names(x) <- paste(i,collapse="-")
      x
    })
    # Nicks approach
    layers <- Values %*% vec
    # Find out which levels we need
    LayerLevels <- unique(sort(layers))
    LayerNames <- c("No Layer",names(nameid[nameid %in% LayerLevels]))

    if(return.stack){
        myStack <- lapply(LayerLevels,function(i){
          r <- raster(nr=dims[1],nc=dims[2])
          r[] <- as.numeric(layers == i)
          r
          } )
        myStack <- stack(myStack)
        layerNames(myStack) <- LayerNames
        return(myStack)

    } else {

      LayerNumber <- match(layers,LayerLevels)
      LayerNumber <- matrix(LayerNumber,ncol=dims[2],byrow=T)
      attr(LayerNumber,"levels") <- LayerNames
      return(LayerNumber)
    }    
}

Proof of concept, using the data of RobertH :

r <- raster(nr=10, nc=10)
r[]=0
r[c(20:60,90:93)] <- 1
s <- list(r)
r[]=0
r[c(40:70,93:98)] <- 1
s <- c(s, r)
r[]=0
r[50:95] <- 1
s <- (c(s, r))
aRaster <- stack(s)


> X <- FindBands(aRaster,return.stack=T)
> plot(X)

enter image description here

> X <- FindBands(aRaster,return.stack=F)
> X
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    1    1    1    1    1    1    1    1     1
 [2,]    1    1    1    1    1    1    1    1    1     2
 [3,]    2    2    2    2    2    2    2    2    2     2
 [4,]    2    2    2    2    2    2    2    2    2     4
 [5,]    4    4    4    4    4    4    4    4    4     8
 [6,]    8    8    8    8    8    8    8    8    8     8
 [7,]    7    7    7    7    7    7    7    7    7     7
 [8,]    5    5    5    5    5    5    5    5    5     5
 [9,]    5    5    5    5    5    5    5    5    5     6
[10,]    6    6    8    7    7    3    3    3    1     1
attr(,"levels")
[1] "No Layer" "1"        "2"        "3"        "1-2"      "1-3"
       "2-3"      "1-2-3"   

> XX <- raster(ncol=10,nrow=10)
> XX[] <- X
> plot(XX)

enter image description here

like image 143
Joris Meys Avatar answered Oct 23 '22 13:10

Joris Meys