Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert object of class "dist" into data frame in r

if possible to convert a data frame to an object of class "dist" is it possible to do just the opposite? convert class "dist" to data frame? for example

< dist(hasil)

   1            2            3           4
2  0.088814413                                    
3  0.084929382  0.030413813                        
4  0.063245553  0.029120440 0.044418465            
5  0.061983869  0.027018512 0.036400549 0.009055385

and the result in data frame

<

   col          row          distance
   1            2            0.088814413
   1            3            0.084929382          
   1            4            0.063245553
   1            5            0.061983869
   2            3            0.030413813
   2            4            0.029120440
   2            5            0.027018512
   3            4            0.044418465
   3            5            0.036400549
   4            5            0.009055385
like image 576
Nadina Avatar asked May 05 '14 14:05

Nadina


People also ask

How do you change an object to a DataFrame in R?

as. data. frame() function in R Programming Language is used to convert an object to data frame. These objects can be Vectors, Lists, Matrices, and Factors.

What is a dist object in R?

In R, the dist() function is used to compute a distance matrix. But the result you get back isn't really a matrix, it's a "dist" object. Under the hood, the "dist" object is stored as a simple vector. When it's printed out, R knows how to make it look like a matrix.

Can you convert a list to a DataFrame in R?

Convert List to DataFrame using data. frame() is used to create a DataFrame in R that takes a list, vector, array, etc as arguments, Hence, we can pass a created list to the data. frame() function to convert list to DataFrame. It will store the elements in a single row in the DataFrame.

Can we convert matrix to DataFrame in R?

A matrix can be converted to a dataframe by using a function called as. data. frame(). It will take each column from the matrix and convert it to each column in the dataframe.


2 Answers

library(maps)
data(us.cities)

d <- dist(head(us.cities[c("lat", "long")]))

##           1         2         3         4         5
## 2 20.160489                                        
## 3 23.139853 40.874243                              
## 4 15.584303  9.865374 38.579820                    
## 5 27.880674  7.882037 48.707100 15.189882          
## 6 26.331187 41.720457  6.900101 41.036931 49.328558

library(reshape2)

df <- melt(as.matrix(d), varnames = c("row", "col"))

df[df$row > df$col,]
##    row col     value
## 2    2   1 20.160489
## 3    3   1 23.139853
## 4    4   1 15.584303
## 5    5   1 27.880674
## 6    6   1 26.331187
## 9    3   2 40.874243
## 10   4   2  9.865374
## 11   5   2  7.882037
## 12   6   2 41.720457
## 16   4   3 38.579820
## 17   5   3 48.707100
## 18   6   3  6.900101
## 23   5   4 15.189882
## 24   6   4 41.036931
## 30   6   5 49.328558
like image 69
Jake Burkhead Avatar answered Oct 23 '22 13:10

Jake Burkhead


I would actually write a function something like this:

myFun <- function(inDist) {
  if (class(inDist) != "dist") stop("wrong input type")
  A <- attr(inDist, "Size")
  B <- if (is.null(attr(inDist, "Labels"))) sequence(A) else attr(inDist, "Labels")
  if (isTRUE(attr(inDist, "Diag"))) attr(inDist, "Diag") <- FALSE
  if (isTRUE(attr(inDist, "Upper"))) attr(inDist, "Upper") <- FALSE
  data.frame(
    row = B[unlist(lapply(sequence(A)[-1], function(x) x:A))],
    col = rep(B[-length(B)], (length(B)-1):1),
    value = as.vector(inDist))
}

Now, imagine we are starting with (note the non-numeric row and column names):

dd <- as.dist((1 - cor(USJudgeRatings)[1:5, 1:5])/2)
#            CONT       INTG       DMNR       DILG
# INTG 0.56659545                                 
# DMNR 0.57684427 0.01769236                      
# DILG 0.49380400 0.06424445 0.08157452           
# CFMG 0.43154385 0.09295712 0.09332092 0.02060062

We can change it with a simple:

myFun(dd)
#     row  col      value
# 1  INTG CONT 0.56659545
# 2  DMNR CONT 0.57684427
# 3  DILG CONT 0.49380400
# 4  CFMG CONT 0.43154385
# 5  DMNR INTG 0.01769236
# 6  DILG INTG 0.06424445
# 7  CFMG INTG 0.09295712
# 8  DILG DMNR 0.08157452
# 9  CFMG DMNR 0.09332092
# 10 CFMG DILG 0.02060062

A quick performance comparison:

set.seed(1)
x <- matrix(rnorm(1000*1000), nrow = 1000)
dd <- dist(x)

## Jake's function
fun2 <- function(inDist) {
  df <- melt(as.matrix(inDist), varnames = c("row", "col"))
  df[as.numeric(df$row) > as.numeric(df$col), ]
}

all(fun2(dd) == myFun(dd))
# [1] TRUE
system.time(fun2(dd))
#    user  system elapsed 
#   0.346   0.002   0.349 
system.time(myFun(dd))
#    user  system elapsed 
#   0.012   0.000   0.015
like image 32
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 23 '22 14:10

A5C1D2H2I1M1N2O1R2T1