The drop
argument in dcast
(from "reshape2" or "dplyr") can be useful when going from a "long" to a "wide" dataset and you want to create columns even for combinations that do not exist in the long form.
It turns out that using drop
also affects combinations the left hand side (LHS) of the formula as well as the right hand side (RHS). Thus, it also creates extra rows based on the combinations of LHS values.
Is there a way to override this behavior?
Here's some sample data:
library(data.table)
DT <- data.table(v1 = c(1.105, 1.105, 1.105, 2.012, 2.012, 2.012),
ID = c(1L, 1L, 1L, 2L, 2L, 2L),
v2 = structure(c(2L, 3L, 5L, 1L, 2L, 6L),
.Label = c("1", "2", "3", "4", "5", "6"),
class = "factor"),
v3 = c(3L, 2L, 2L, 5L, 4L, 3L))
Notice that "v2" is a factor
column with 6 levels. I essentially want to go from "long" to wide", but add in columns for any missing factor levels (in this case "4").
reshape
handles the shape, but not the missing columns:
reshape(DT, direction = "wide", idvar = c("ID", "v1"), timevar = "v2")
# v1 ID v3.2 v3.3 v3.5 v3.1 v3.6
# 1: 1.105 1 3 2 2 NA NA
# 2: 2.012 2 4 NA NA 5 3
dcast
handles adding the missing columns, but only if there's one value on the LHS:
dcast(DT, ID ~ v2, value.var = "v3", drop = FALSE)
# ID 1 2 3 4 5 6
# 1: 1 NA 3 2 NA 2 NA
# 2: 2 5 4 NA NA NA 3
If there are multiple values on the LHS, the combinations of the values on the LHS are also expanded out, as if we had used CJ
or expand.grid
, but rows 2 and 3 are not at all of interest to me:
dcast(DT, ... ~ v2, value.var = "v3", drop = FALSE)
# v1 ID 1 2 3 4 5 6
# 1: 1.105 1 NA 3 2 NA 2 NA
# 2: 1.105 2 NA NA NA NA NA NA
# 3: 2.012 1 NA NA NA NA NA NA
# 4: 2.012 2 5 4 NA NA NA 3
This is similar to using xtabs
in base R: ftable(xtabs(v3 ~ ID + v1 + v2, DT))
.
Is there a way to let dcast
know that essentially, "Hey. The combination of values on the LHS are the IDs. Don't try to fill them in for me."
My current approach is to do three steps, one for collapsing down the LHS values, another for spreading out the RHS values, and then one for merging the result.
merge(DT[, list(v1 = unique(v1)), .(ID)], ## or unique(DT[, c("ID", "v1"), with = FALSE])
dcast(DT, ID ~ v2, value.var = "v3", drop = FALSE),
by = "ID")[]
# ID v1 1 2 3 4 5 6
# 1: 1 1.105 NA 3 2 NA 2 NA
# 2: 2 2.012 5 4 NA NA NA 3
Is there a better approach that I'm missing?
dcast: Convert data between wide and long forms.
dcast uses formula interface. The variables on the LHS of formula represents the id vars and RHS the measure vars. value. var denotes the column to be filled in with while casting to wide format.
In this example, we use the dcast() function to reshape our data. That is, we can use dcast() to order our data rows according to specific variables. Table 5 reveals the output of the previous R programming syntax – Before, the data was ordered as time, feature, ID. Now it is ordered as ID, time, feature.
reshape2 is an R package written by Hadley Wickham that makes it easy to transform data between wide and long formats.
Just implemented in data.table development version v1.9.7, commit 2113, closes #1512.
require(data.table) # v1.9.7, commit 2113+
dcast(DT, ... ~ v2, value.var = "v3", drop = c(TRUE, FALSE))
# v1 ID 1 2 3 4 5 6
# 1: 1.105 1 NA 3 2 NA 2 NA
# 2: 2.012 2 5 4 NA NA NA 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With