Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Calculating Number of Adjacent Identical Values in Data Frame or Array

I have a dataset of categorical substrate sizes from stream bottoms. Due to how I collected the data, I can spatially arrange them into a matrix where their relationship with their neighbors is preserved (ie, to the left, in front of, etc). An example would look like this:

     P.1 P.2 P.3 P.4 P.5
T 1    G   C   C   P   C
T 2    P   C   B   G   C
T 3   SI  SI   C   B   C
T 4   SI  BR  BR  SI  SI
T 5   BR  CL  BR  BR   B
T 6   BR  BR  BR  BR   C

Where P(n) is an actual point measurement on a transect across the stream from left to right, and T(n) gives transects from upstream to downstream. As you can see, some of the substrate types (notably bedrock, "BR", in this sample), have larger adjacent patches than others. This is ecologically meaningful, and maybe more so than just the percent of BR in the sample.

My question is this: Is there a simple way to calculate the number of substrate measurements of the same type which are adjacent to one another? Note that corner adjacent is also considered adjacent.

EDIT following very helpful commentary:

An example output would be a list of each type of patch, and the number of measurements in each patch. It might look like this:

$BR  
[1] 9  

$B  
[1] 1 1  

$C  
[1] 4 3 1  

$P  
[1] 1 1  

$G  
[1] 1 1  

$SI  
[1] 3 2  
like image 760
Loren Avatar asked Oct 21 '22 15:10

Loren


1 Answers

A fun little problem. I attach a solution, it should work on any matrix of factors. It is using the foreach and data.table packages so you might want to install those.

It works by first stacking the data and mapping each location to a value. Then it iterates through the original matrix doing a greedy self-recursion on the neighbors but first deleting itself (avoid counting itself multiple times) from the stacked matrix.

I do not like some of the for loops in this solution but given the speedup from interacting with the stacked frame I did not see an easy way around it without re-working this completely. A better implementation would run this in parallel threads (probably by patch type instead of location) using a package like synchronicity to put a mutex lock around the stacked data (anyone?).

dcast in the reshape2 package is also a good option for creating the stacked frame.

For this matrix:

> d
    P-1 P-2 P-3 P-4 P-5 P-6
T-1   G   P  SI  SI  BR  BR
T-2   C   C  SI  BR  CL  BR
T-3   C   B   C  BR  BR  BR
T-4   P   G   B  SI  BR  BR
T-5   C   C   C  SI   B   C

It gives the following result (which looks like what you were asking for):

> patchesList
$G
[1] 1 1
$C
[1] 4 3 1
$P
[1] 1 1
$B
[1] 2 1
$SI
[1] 3 2
$BR
[1] 9
$CL
[1] 1

The data set up code:

rm(list=ls())
d = strsplit("G   C   C   P   C P   C   B   G   C SI  SI   C   B   C SI  BR  BR  SI  SI BR  CL  BR  BR   B BR  BR  BR  BR   C"," ")[[1]]
d=d[-which(d=="")]
d=data.frame(matrix(d,nrow=5),stringsAsFactors=F)
rownames(d) = paste("T",1:5,sep="-")
colnames(d) = paste("P",1:6,sep="-")
levs = unique(unlist(d))

stacking the original data (with information on location):

idxsFrame = expand.grid(1:nrow(d),1:ncol(d))
colnames(idxsFrame) = c("ri","cj")
idxsFrame$value = apply(idxsFrame,1,function(x) { d[x[["ri"]],x[["cj"]]] } )
require(data.table)
idxsFrame = data.table(idxsFrame)

set up the output list:

patchesList = vector(mode="list",length=length(levs))
names(patchesList) = levs 
require(foreach)

the self-recursive function that does the scanning:

scanSurroundTiles = function(tile) 
{  
  surroundTiles = idxsFrame[ri>=(tile$ri-1) & ri <=(tile$ri+1) & cj>=(tile$cj-1) & cj<=(tile$cj+1),,drop=F]
  baseMatches = surroundTiles[which(surroundTiles$value == tile$value),,drop=F]  
  if(nrow(baseMatches) < 1) 
    return(tile)
  else
  {
    # not possible to do an apply(matches,1,scanSurroundTiles) because of overlap and self-recursiveness on deeper levels
    newMatches <- foreach(mc = 1:nrow(baseMatches), .combine=rbind) %do% # mc = 2; 
    {
      inIdxs = which(idxsFrame$ri==baseMatches$ri[mc] & idxsFrame$cj==baseMatches$cj[mc])
      if(length(inIdxs)>0)
      { assign("idxsFrame",idxsFrame[-inIdxs,,drop=F],globalenv()) 
        return(scanSurroundTiles(baseMatches[mc,,drop=F]))      
      } else
      { return(NULL) } # could have been removed from previous foreach 
    }
    return(rbind(tile,newMatches))
  }
}

the main loop:

for(i in 1:nrow(d))  
{
  for(j in 1:ncol(d)) 
  { 
    sourceTile = idxsFrame[ri==i & cj==j,,drop=F]
    if(nrow(sourceTile) > 0)
    {
      idxsFrame <- idxsFrame[-which(idxsFrame$ri==sourceTile$ri & idxsFrame$cj==sourceTile$cj),,drop=F]
      thisPatch = scanSurroundTiles(sourceTile)
# if you want to do some calc by patch (mean, sd) this is the place to do it by adding other info beyond the type in the stacked frame
      patchesList[[thisPatch$value[1]]] = c(patchesList[[thisPatch$value[1]]],nrow(thisPatch))      
    }  
  }
}
like image 197
Hans Roggeman Avatar answered Nov 03 '22 08:11

Hans Roggeman