I'm trying just to calculate the Hamming distance between two vectors in R. I'm currently attempting to use the "e1071" package, and the hamming.distance function, as follows:
library(e1071)
H <- hamming.distance(X)
Where X is a data.frame with 2 rows and (in my particular data) 667 columns, and every observation is 0 or 1.
Initially I got the error:
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
After some research, it appeared that one fix might be increasing the basic option in R. This I did via options(expressions=5000), and then tried varying values in place of the 5000. But this only produced the error:
Error: C stack usage is too close to the limit
I'm not much of a programmer, and the fixes for this most recent error appear to have to do with something inside the package e1071 possibly not being called correctly (or at the right time).
Any ideas on what I'm doing wrong? I eventually want the Hamming distances between a large number of vectors, and this was just a starting point. If this has to do with memory allocation, any suggestions for how to deal with it?
I don't know how hamming.distance
works internally, but a simple way to calculate the distance for 2 vectors is just
sum(x1 != x2)
or, in this case,
sum(X[1,] != X[2,])
If the total number of vectors is not too large (up to, say, a few thousand), you could implement this in a nested loop:
n <- nrow(X)
m <- matrix(nrow=n, ncol=n)
for(i in seq_len(n - 1))
for(j in seq(i, n))
m[j, i] <- m[i, j] <- sum(X[i,] != X[j,])
Caveat: untested.
WARNING ABOUT USING HAMMING.DISTANCE FROM PACKAGE e1071!
This package's implementation forces the objects being compared to booleans with as.logical. This means that values of 0 will be FALSE and any non-zero values will be TRUE. This means that for the sequence: 0 1 2 compared to 0 1 1 the hamming distance will be reported as 0 instead of the correct value of 1 -- this package will treat 1 and 2 as equal since as.logical(1) == as.logical(2).
Here is the faulty (in my view) implementation:
> library("e1071", lib.loc="C:/Program Files/R/R-2.15.3/library")
Loading required package: class
> hamming.distance
function (x, y)
{
z <- NULL
if (is.vector(x) && is.vector(y)) {
z <- sum(as.logical(x) != as.logical(y))
}
else {
z <- matrix(0, nrow = nrow(x), ncol = nrow(x))
for (k in 1:(nrow(x) - 1)) {
for (l in (k + 1):nrow(x)) {
z[k, l] <- hamming.distance(x[k, ], x[l, ])
z[l, k] <- z[k, l]
}
}
dimnames(z) <- list(dimnames(x)[[1]], dimnames(x)[[1]])
}
z
}
<environment: namespace:e1071>
My recommendation: DO NOT USE. Hamming distance is trivial to implement as noted several times above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With