R - How to get row & column subscripts of matched elements from a distance matrix

Tags:

I have an integer vector vec1 and I am generating a distant matrix using dist function. I want to get the coordinates (row and column) of element of certain value in the distance matrix. Essentially I would like to get the pair of elements that are d-distant apart. For example:

vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)

#   1  2  3  4
#2  1         
#3  4  3      
#4 10  9  6   
#5 15 14 11  5

Say, I am interested in pair of elements in the vector that are 5 unit apart. I wanted to get the coordinate1 which are the rows and coordinate2 which are the columns of the distance matrix. In this toy example, I would expect

coord1  
# [1] 5
coord2
# [1] 4

I am wondering if there is an efficient way to get these values that doesn't involve converting the dist object to a matrix or looping through the matrix?

661

asked Aug 17 '16 20:08

DKangeyan

1 Answers

A distance matrix is a lower triangular matrix in packed format, where the lower triangular is stored as a 1D vector by column. You can check this via

str(distMatrix)
# Class 'dist'  atomic [1:10] 1 4 10 15 3 9 14 6 11 5
# ...

Even if we call dist(vec1, diag = TRUE, upper = TRUE), the vector is still the same; only the printing styles changes. That is, no matter how you call dist, you always get a vector.

This answer focus on how to transform between 1D and 2D index, so that you can work with a "dist" object without first making it a complete matrix using as.matrix. If you do want to make it a matrix, use the dist2mat function defined in as.matrix on a distance object is extremely slow; how to make it faster?.

2D to 1D

1D to 2D

R functions

It is easy to write vectorized R functions for those index transforms. We only need some care dealing with "out-of-bound" index, for which NA should be returned.

## 2D index to 1D index
f <- function (i, j, dist_obj) {
  if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
  n <- attr(dist_obj, "Size")
  valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
  k <- (2 * n - j) * (j - 1) / 2 + (i - j)
  k[!valid] <- NA_real_
  k
  }

## 1D index to 2D index
finv <- function (k, dist_obj) {
  if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
  n <- attr(dist_obj, "Size")
  valid <- (k >= 1) & (k <= n * (n - 1) / 2)
  k_valid <- k[valid]
  j <- rep.int(NA_real_, length(k))
  j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1))) / 2)
  i <- j + k - (2 * n - j) * (j - 1) / 2
  cbind(i, j)
  }

These functions are extremely cheap in memory usage, as they work with index instead of matrices.

Applying `finv` to your question

You can use

vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)

finv(which(distMatrix == 5), distMatrix)
#     i j
#[1,] 5 4

Generally speaking, a distance matrix contains floating point numbers. It is risky to use == to judge whether two floating point numbers are equal. Read Why are these numbers not equal? for more and possible strategies.

Alternative with `dist2mat`

Using the dist2mat function given in as.matrix on a distance object is extremely slow; how to make it faster?, we may use which(, arr.ind = TRUE).

library(Rcpp)
sourceCpp("dist2mat.cpp")
mat <- dist2mat(distMatrix, 128)
which(mat == 5, arr.ind = TRUE)
#  row col
#5   5   4
#4   4   5

Appendix: Markdown (needs MathJax support) for the picture

## 2D index to 1D index

The lower triangular looks like this: $$\begin{pmatrix} 0 & 0 & \cdots & 0\\ \times & 0 & \cdots & 0\\ \times & \times & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ \times & \times & \cdots & 0\end{pmatrix}$$ If the matrix is $n \times n$, then there are $(n - 1)$ elements ("$\times$") in the 1st column, and $(n - j)$ elements in the j<sup>th</sup> column. Thus, for element $(i,\  j)$ (with $i > j$, $j < n$) in the lower triangular, there are $$(n - 1) + \cdots (n - (j - 1)) = \frac{(2n - j)(j - 1)}{2}$$ "$\times$" in the previous $(j - 1)$ columns, and it is the $(i - j)$<sup>th</sup> "$\times$" in the $j$<sup>th</sup> column. So it is the $$\left\{\frac{(2n - j)(j - 1)}{2} + (i - j)\right\}^{\textit{th}}$$ "$\times$" in the lower triangular.

----

## 1D index to 2D index

Now for the $k$<sup>th</sup> "$\times$" in the lower triangular, how can we find its matrix index $(i,\ j)$? We take two steps: 1> find $j$;  2> obtain $i$ from $k$ and $j$.

The first "$\times$" of the $j$<sup>th</sup> column, i.e., $(j + 1,\ j)$, is the $\left\{\frac{(2n - j)(j - 1)}{2} + 1\right\}^{\textit{th}}$ "$\times$" of the lower triangular, thus $j$ is the maximum value such that $\frac{(2n - j)(j - 1)}{2} + 1 \leq k$. This is equivalent to finding the max $j$ so that $$j^2 - (2n + 1)j + 2(k + n - 1) \geq 0.$$ The LHS is a quadratic polynomial, and it is easy to see that the solution is the integer no larger than its first root (i.e., the root on the left side): $$j = \left\lfloor\frac{(2n + 1) - \sqrt{(2n-1)^2 - 8(k-1)}}{2}\right\rfloor.$$ Then $i$ can be obtained from $$i = j + k - \left\{\frac{(2n - j)(j - 1)}{2}\right\}.$$

168

answered Oct 24 '22 11:10

Zheyuan Li

Related questions
                            
                                String split on a number word pattern
                            
                                How to match 2 dataframe columns and extract column values and column names?
                            
                                ggplot: Subset a layer where data is passed using a pipe
                            
                                Specify colors for each link in a force directed network, networkD3::forceNetwork()
                            
                                Reactive Function Parameters
                            
                                Error in predict() glmnet function: not-yet-implemented method
                            
                                Pass arguments in nested function to update default arguments
                            
                                R Shiny img() on UI side does not render the image
                            
                                Sentimental Analysis of review comments using qdap is slow
                            
                                How to balance unbalanced classification 1:1 with SMOTE in R
                            
                                see memory usage of the computer vs of memory usage of R in Rstudio?
                            
                                How to convert a list() to an ellipsis in R?
                            
                                Index of non-unique element in data frame
                            
                                Using scale_size_area (ggplot2) to plot points of size "0" as completely absent
                            
                                Nested ifelse with varying columns in data.table
                            
                                R: data.table. How to save dates properly with fwrite?
                            
                                How to create a different report for each subset of a data frame with R markdown?
                            
                                ggplot donut chart percentage labels
                            
                                Learning data.table - how to update values by row number and column name
                            
                                Using a variable in update() in R to update formula

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R - How to get row & column subscripts of matched elements from a distance matrix

Tags:

r

matrix

euclidean-distance

distance

DKangeyan

People also ask

1 Answers

R functions

Applying `finv` to your question

Alternative with `dist2mat`

Appendix: Markdown (needs MathJax support) for the picture

Zheyuan Li

Recent Activity

Donate For Us

R - How to get row & column subscripts of matched elements from a distance matrix

Tags:

r

matrix

euclidean-distance

distance

DKangeyan

People also ask

1 Answers

R functions

Applying finv to your question

Alternative with dist2mat

Appendix: Markdown (needs MathJax support) for the picture

Zheyuan Li

Related questions

Recent Activity

Donate For Us

Applying `finv` to your question

Alternative with `dist2mat`