This question is similar to questions that have been asked regarding floating-point error in other languages (for example here), however I haven't found a satisfactory solution.
I'm working on a project that involves investigating matrices that share certain characteristics. As part of that, I need to know how many matrices in a list are unique.
D <- as.matrix(read.table("datasource",...))
mat_list <- vector('list',length=length(samples_list))
mat_list <- lapply(1:length(samples_list),function(i) matrix(data=0,nrow(D),ncol(D)))
This list is then populated by computations from the data based on the elements of samples_list
. After mat_list
has been populated, I need to removed duplicates. Running
mat_list <- unique(mat_list)
narrows things down quite a bit; however, many of those elements are really within machine error of each other. The function unique
does not allow one to specify precision, and I was unable to find source code for modification.
One idea I had was this:
ErrorReduction<-function(mat_list, tol=2){
len <- length(mat_list)
diff <- mat_list[[i]]-mat_list[[i+1]]
for(i in 1:len-1){
if(norm(diff,"i")<tol){
mat_list[[i+1]] <- mat_list[i]
}
}
mat_list<-unique(mat_list)
return(mat_list)
}
but this only looks at pairwise differences. It would be simple but most likely inefficient to do this with nested for
loops.
What methods do you know of, or what ideas do you have, of handling the problem of identifying and removing matrices that are within machine error of being duplicates?
Here is a function that applies all.equal
to every pair using outer
and removes all duplicates:
approx.unique <- function(l) {
is.equal.fun <- function(i, j)isTRUE(all.equal(norm(l[[i]] - l[[j]], "M"), 0))
is.equal.mat <- outer(seq_along(l), seq_along(l), Vectorize(is.equal.fun))
is.duplicate <- colSums(is.equal.mat * upper.tri(is.equal.mat)) > 0
l[!is.duplicate]
}
An example:
a <- matrix(runif(12), 4, 3)
b <- matrix(runif(12), 4, 3)
c <- matrix(runif(12), 4, 3)
all <- list(a1 = a, b1 = b, a2 = a, a3 = a, b2 = b, c1 = c)
names(approx.unique(all))
# [1] "a1" "b1" "c1"
I believe you are looking for all.equal
which compares objects 'within machine error'. Check out ?all.equal
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With