Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: include factors with no entries when using dcast

Tags:

r

reshape2

I am using the reshape2 function dcast on a dataframe. One of the variables is a factor where some of the levels do not appear in the dataframe, but I would to include all values in the new columns created.

For example say I run the following

library(reshape2)
dataDF <- data.frame(
  id = 1:6,
  id2 = c(1,2,3,1,2,3),
  x = c(rep('t1', 3), rep('t2', 3)),
  y = factor(c('A', 'B', 'A', 'B', 'B', 'C'), levels = c('A', 'B', 'C', 'D')),
  value = rep(1)
)

dcast(dataDF, id + id2 ~ x + y, fill = 0)

I get the following

  id id2 t1_A t1_B t2_B t2_C
1  1   1    1    0    0    0
2  2   2    0    1    0    0
3  3   3    1    0    0    0
4  4   1    0    0    1    0
5  5   2    0    0    1    0
6  6   3    0    0    0    1

But I also want to include the columns t1_C, t1_D, t2_A and t2_D full of 0's

i.e. I want the following

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0

Also, as an aisde, would it be possible to create the above without having the column 'value' full of ones in the initial dataframe. Basically just want to cast x & y in their own columns with a 1 if they exist in that id.

Thanks in advance

EDIT: Initially had one variable on LHS which Jeremy answer below, but actual have more than one variable on LHS so edited question to reflect this

like image 283
user1165199 Avatar asked Oct 09 '15 14:10

user1165199


1 Answers

Try adding drop = FALSE to your dcast call, so that unused factor levels are not dropped:

dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)

  id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1    1    0    0    0    0    0    0    0
2  2    0    1    0    0    0    0    0    0
3  3    1    0    0    0    0    0    0    0
4  4    0    0    0    0    0    1    0    0
5  5    0    0    0    0    0    1    0    0
6  6    0    0    0    0    0    0    1    0

For your aside, yes, we just need to tell dcast what you want using a function to aggregate, in this case you want length:

data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)

For your edit, I'd use tidyr and dplyr rather than reshape2:

library(tidyr)
library(dplyr)

dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
           unite(z, x, y) %>%
           spread(z, value, fill = 0) %>%
           na.omit

First we complete all combination of x and y using expand.grid and merging, then we unite them into one column, z, then we spread them out, then remove the NAs from the id columns:

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0
like image 60
jeremycg Avatar answered Oct 13 '22 21:10

jeremycg