Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reconstruct symmetric matrix from values in long-form

Tags:

r

matrix

reshape

I have a tsv that looks like this (long-form):

  one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30

I'm trying to get this into a dataframe for R (or pandas)

    a  b  c  d 
a   00 30 40 20
b   30 00 10 05 
c   40 10 00 30
d   20 05 30 00

The problem is, in my tsv I only have a, b defined and not b,a. So I get a lot of NAs in my dataframe.

The final goal is to get a distance matrix to use in clustering. Any help would be appreciated.

like image 308
jwillis0720 Avatar asked Jun 28 '15 01:06

jwillis0720


2 Answers

An igraph solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix

dat <- read.table(header=T, text=" one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30")

library(igraph)

# Make undirected so that graph matrix will be symmetric
g <- graph.data.frame(dat, directed=FALSE)

# add value as a weight attribute
get.adjacency(g, attr="value", sparse=FALSE)
#   a  b  c  d
#a  0 30 40 20
#b 30  0 10  5
#c 40 10  0 30
#d 20  5 30  0
like image 53
user20650 Avatar answered Oct 02 '22 00:10

user20650


Yet another approach is reshape::cast

df.long = data.frame(one=c('a','a','a','b','b','c'),
                     two=c('b','c','d','c','d','d'),
                     value=c(30,40,20,10,05,30) )

# cast will recover the upper/lower-triangles...
df <- as.matrix( cast(df.long, one ~ two, fill=0) )
#    b  c  d
# a 30 40 20
# b  0 10  5
# c  0  0 30

So we construct matrix with full indices, and insert:

df <- matrix(nrow=length(indices), ncol=length(indices),dimnames = list(indices,indices))    
diag(df) <- 0
# once we assure that the full upper-triangle is present and in sorted order (as Robert's answer does), then we
df[upper.tri(df)] <- as.matrix( cast(df.long, one ~ two, fill=0) )
df[lower.tri(df)] <- df[upper.tri(df)]

UPDATE: the original sketch included these manual kludges

Then the same approaches to add the missing row 'd' and column 'a', and fill the lower triangle by adding the transpose t(df) :

df <- cbind(a=rep(0,4), rbind(df, d=rep(0,3)))
#   a  b  c  d
# a 0 30 40 20
# b 0  0 10  5
# c 0  0  0 30
# d 0  0  0  0

df + t(df)
#    a  b  c  d
# a  0 30 40 20
# b 30  0 10  5
# c 40 10  0 30
# d 20  5 30  0
like image 37
smci Avatar answered Oct 02 '22 00:10

smci