Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove matrices from a list that are duplicates within floating-point error?

This question is similar to questions that have been asked regarding floating-point error in other languages (for example here), however I haven't found a satisfactory solution.

I'm working on a project that involves investigating matrices that share certain characteristics. As part of that, I need to know how many matrices in a list are unique.

 D <- as.matrix(read.table("datasource",...))
 mat_list <- vector('list',length=length(samples_list))
 mat_list <- lapply(1:length(samples_list),function(i) matrix(data=0,nrow(D),ncol(D)))

This list is then populated by computations from the data based on the elements of samples_list. After mat_list has been populated, I need to removed duplicates. Running

mat_list <- unique(mat_list)

narrows things down quite a bit; however, many of those elements are really within machine error of each other. The function unique does not allow one to specify precision, and I was unable to find source code for modification.

One idea I had was this:

ErrorReduction<-function(mat_list, tol=2){
  len <- length(mat_list)
  diff <- mat_list[[i]]-mat_list[[i+1]]
  for(i in 1:len-1){
     if(norm(diff,"i")<tol){
     mat_list[[i+1]] <- mat_list[i]
     }
  }
  mat_list<-unique(mat_list)
  return(mat_list)
}

but this only looks at pairwise differences. It would be simple but most likely inefficient to do this with nested for loops.

What methods do you know of, or what ideas do you have, of handling the problem of identifying and removing matrices that are within machine error of being duplicates?

like image 853
Daniel Watkins Avatar asked Apr 29 '13 23:04

Daniel Watkins


2 Answers

Here is a function that applies all.equal to every pair using outer and removes all duplicates:

approx.unique <- function(l) {
   is.equal.fun <- function(i, j)isTRUE(all.equal(norm(l[[i]] - l[[j]], "M"), 0))
   is.equal.mat <- outer(seq_along(l), seq_along(l), Vectorize(is.equal.fun))
   is.duplicate <- colSums(is.equal.mat * upper.tri(is.equal.mat)) > 0
   l[!is.duplicate]
}

An example:

a <- matrix(runif(12), 4, 3)
b <- matrix(runif(12), 4, 3)
c <- matrix(runif(12), 4, 3)

all <- list(a1 = a, b1 = b, a2 = a, a3 = a, b2 = b, c1 = c)

names(approx.unique(all))
# [1] "a1" "b1" "c1"
like image 197
flodel Avatar answered Nov 15 '22 07:11

flodel


I believe you are looking for all.equal which compares objects 'within machine error'. Check out ?all.equal.

like image 32
Bryan Hanson Avatar answered Nov 15 '22 09:11

Bryan Hanson