I am using the reshape2 function dcast on a dataframe. One of the variables is a factor where some of the levels do not appear in the dataframe, but I would to include all values in the new columns created.
For example say I run the following
library(reshape2)
dataDF <- data.frame(
id = 1:6,
id2 = c(1,2,3,1,2,3),
x = c(rep('t1', 3), rep('t2', 3)),
y = factor(c('A', 'B', 'A', 'B', 'B', 'C'), levels = c('A', 'B', 'C', 'D')),
value = rep(1)
)
dcast(dataDF, id + id2 ~ x + y, fill = 0)
I get the following
id id2 t1_A t1_B t2_B t2_C
1 1 1 1 0 0 0
2 2 2 0 1 0 0
3 3 3 1 0 0 0
4 4 1 0 0 1 0
5 5 2 0 0 1 0
6 6 3 0 0 0 1
But I also want to include the columns t1_C, t1_D, t2_A and t2_D full of 0's
i.e. I want the following
id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0
Also, as an aisde, would it be possible to create the above without having the column 'value' full of ones in the initial dataframe. Basically just want to cast x & y in their own columns with a 1 if they exist in that id.
Thanks in advance
EDIT: Initially had one variable on LHS which Jeremy answer below, but actual have more than one variable on LHS so edited question to reflect this
Try adding drop = FALSE
to your dcast call, so that unused factor levels are not dropped:
dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)
id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0
4 4 0 0 0 0 0 1 0 0
5 5 0 0 0 0 0 1 0 0
6 6 0 0 0 0 0 0 1 0
For your aside, yes, we just need to tell dcast
what you want using a function to aggregate
, in this case you want length
:
data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)
For your edit, I'd use tidyr
and dplyr
rather than reshape2
:
library(tidyr)
library(dplyr)
dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
unite(z, x, y) %>%
spread(z, value, fill = 0) %>%
na.omit
First we complete all combination of x and y using expand.grid
and merging, then we unite
them into one column, z, then we spread
them out, then remove the NAs from the id columns:
id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1 1 1 1 0 0 0 0 0 0 0
2 2 2 0 1 0 0 0 0 0 0
3 3 3 1 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 1 0 0
5 5 2 0 0 0 0 0 1 0 0
6 6 3 0 0 0 0 0 0 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With