Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating a square matrix from a data frame [duplicate]

I'm having trouble to make from my data.frame a square matrix. Now my data looks something like this:

  var1 var2 value
    A    B     4
    C    D     5 
    D    A     2
    B    D     1

I'm trying to transform the data.frame to a matrix that looks like this:

    A    B    C   D
  A 0    4    0   2
  B 4    0    0   1
  C 0    0    0   5
  D 2    1    5   0

I tried many functions from the different package available in R but still cannot find a solution.

like image 257
Brenna Avatar asked Oct 04 '17 10:10

Brenna


2 Answers

Here is a base R method using matrix indexing on character vectors.

## set up storage matrix
# get names for row and columns
nameVals <- sort(unique(unlist(dat[1:2])))
# construct 0 matrix of correct dimensions with row and column names
myMat <- matrix(0, length(nameVals), length(nameVals), dimnames = list(nameVals, nameVals))

# fill in the matrix with matrix indexing on row and column names
myMat[as.matrix(dat[c("var1", "var2")])] <- dat[["value"]]

This returns

myMat
  A B C D
A 0 4 0 0
B 0 0 0 1
C 0 0 0 5
D 2 0 0 0

For details on how this powerful form of indexing works, see the Matrices and arrays section of the help file ?"[". In particular, the fourth paragraph of the section discusses this form of indexing.

Note that I assume that the first two variables are character vectors rather then factors. This makes it a bit easier, since I don't have to use as.character to coerce them.

To convert the result to a data.frame, simply wrap the above code in the as.data.frame function.

data

dat <- 
structure(list(var1 = c("A", "C", "D", "B"), var2 = c("B", "D", 
"A", "D"), value = c(4L, 5L, 2L, 1L)), .Names = c("var1", "var2", 
"value"), class = "data.frame", row.names = c(NA, -4L))
like image 132
lmo Avatar answered Oct 08 '22 19:10

lmo


If we make all the character columns factors with levels 'A', 'B', 'C', 'D' then we can use xtabs without dropping any columns.

Unfortunately, the resulting matrix isn't symmetric.

library('tidyverse')

df <- tribble(
  ~var1, ~var2, ~value,
    'A',   'B',      4,
    'C',   'D',      5,
    'D',   'A',      2,
    'B',   'D',      1
)

df %>%
  mutate_if(is.character, factor, levels=c('A', 'B', 'C', 'D')) %>%
  xtabs(value ~ var1 + var2, ., drop.unused.levels = F)
#     var2
# var1 A B C D
#    A 0 4 0 0
#    B 0 0 0 1
#    C 0 0 0 5
#    D 2 0 0 0

To make it symmetric, I just added its transpose to itself. This feels like a bit of a hack, though.

df %>%
  mutate_if(is.character, factor, levels=c('A', 'B', 'C', 'D')) %>%
  xtabs(value ~ var1 + var2, ., drop.unused.levels = F) %>%
  '+'(., t(.))
#     var2
# var1 A B C D
#    A 0 4 0 2
#    B 4 0 0 1
#    C 0 0 0 5
#    D 2 1 5 0
like image 24
Paul Avatar answered Oct 08 '22 20:10

Paul