Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]

I have a data.frame that looks like this.

x a 1 
x b 2 
x c 3 
y a 3 
y b 3 
y c 2 

I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:

    a    b    c
x   1    2    3
y   3    3    2

I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

like image 430
MalteseUnderdog Avatar asked Mar 08 '12 12:03

MalteseUnderdog


People also ask

How do I convert long to wide in R?

To convert long data back into a wide format, we can use the cast function. There are many cast functions, but we will use the dcast function because it is used for data frames.


2 Answers

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
                  y=gl(3,1,6, labels=letters[1:3]), 
                  z=c(1,2,3,3,3,2))

Using the tidyverse:

The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.

library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x         a     b     c
## <fct> <dbl> <dbl> <dbl>
## 1 x       1     2     3
## 2 y       3     3     2

The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.

library(tidyr)
spread(tmp, y, z)
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using reshape2:

One of the first steps toward the tidyverse was the reshape2 package.

To get a matrix use acast:

library(reshape2)
acast(tmp, x~y, value.var="z")
##   a b c
## x 1 2 3
## y 3 3 2

Or to get a data frame, use dcast, as here: Reshape data for values in one column.

dcast(tmp, x~y, value.var="z")
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using plyr:

In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673

library(plyr)
daply(tmp, .(x, y), function(x) x$z)
##    y
## x   a b c
##   x 1 2 3
##   y 3 3 2

Using matrix indexing:

This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.

with(tmp, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})

Using xtabs:

xtabs(z~x+y, data=tmp)

Using a sparse matrix:

There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names

with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
                       dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
##   a b c
## x 1 2 3
## y 3 3 2

Using reshape:

You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

reshape(tmp, idvar="x", timevar="y", direction="wide")
##   x z.a z.b z.c
## 1 x   1   2   3
## 4 y   3   3   2
like image 152
Aaron left Stack Overflow Avatar answered Sep 21 '22 21:09

Aaron left Stack Overflow


The question is some years old but maybe some people are still interested in alternative answers.

If you don't want to load any packages, you might use this function:

#' Converts three columns of a data.frame into a matrix -- e.g. to plot 
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#' 
#' @param data data.frame: input data
#' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' @param datatitle string: name of the column in data, which values should be filled into the output matrix
#' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' @author Daniel Neumann
#' @date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle, 
                             rowdecreasing = FALSE, coldecreasing = FALSE,
                             default_value = NA) {

  # check, whether titles exist as columns names in the data.frame data
  if ( (!(rowtitle%in%names(data))) 
       || (!(coltitle%in%names(data))) 
       || (!(datatitle%in%names(data))) ) {
    stop('data.frame2matrix: bad row-, col-, or datatitle.')
  }

  # get number of rows in data
  ndata = dim(data)[1]

  # extract rownames and colnames for the matrix from the data.frame
  rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
  nrows = length(rownames)
  colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
  ncols = length(colnames)

  # initialize the matrix
  out_matrix = matrix(NA, 
                      nrow = nrows, ncol = ncols,
                      dimnames=list(rownames, colnames))

  # iterate rows of data
  for (i1 in 1:ndata) {
    # get matrix-row and matrix-column indices for the current data-row
    iR = which(rownames==data[[rowtitle]][i1])
    iC = which(colnames==data[[coltitle]][i1])

    # throw an error if the matrix entry (iR,iC) is already filled.
    if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
    out_matrix[iR, iC] = data[[datatitle]][i1]
  }

  # set empty matrix entries to the default value
  out_matrix[is.na(out_matrix)] = default_value

  # return matrix
  return(out_matrix)

}

How it works:

myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
                            'dim2'=c('a','b','c','a','b','c'),
                            'values'=c(1,2,3,3,3,2))) 

myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')

myMatrix
>   a b c
> x 1 2 3
> y 3 3 2
like image 35
daniel.heydebreck Avatar answered Sep 18 '22 21:09

daniel.heydebreck