Make the `drop` argument in `dcast` only look at the RHS of the formula

Tags:

The drop argument in dcast (from "reshape2" or "dplyr") can be useful when going from a "long" to a "wide" dataset and you want to create columns even for combinations that do not exist in the long form.

It turns out that using drop also affects combinations the left hand side (LHS) of the formula as well as the right hand side (RHS). Thus, it also creates extra rows based on the combinations of LHS values.

Is there a way to override this behavior?

Here's some sample data:

library(data.table)
DT <- data.table(v1 = c(1.105, 1.105, 1.105, 2.012, 2.012, 2.012),
                 ID = c(1L, 1L, 1L, 2L, 2L, 2L), 
                 v2 = structure(c(2L, 3L, 5L, 1L, 2L, 6L), 
                                .Label = c("1", "2", "3", "4", "5", "6"), 
                                class = "factor"),
                 v3 = c(3L, 2L, 2L, 5L, 4L, 3L))

Notice that "v2" is a factor column with 6 levels. I essentially want to go from "long" to wide", but add in columns for any missing factor levels (in this case "4").

reshape handles the shape, but not the missing columns:

reshape(DT, direction = "wide", idvar = c("ID", "v1"), timevar = "v2")
#       v1 ID v3.2 v3.3 v3.5 v3.1 v3.6
# 1: 1.105  1    3    2    2   NA   NA
# 2: 2.012  2    4   NA   NA    5    3

dcast handles adding the missing columns, but only if there's one value on the LHS:

dcast(DT, ID ~ v2, value.var = "v3", drop = FALSE)
#    ID  1 2  3  4  5  6
# 1:  1 NA 3  2 NA  2 NA
# 2:  2  5 4 NA NA NA  3

If there are multiple values on the LHS, the combinations of the values on the LHS are also expanded out, as if we had used CJ or expand.grid, but rows 2 and 3 are not at all of interest to me:

dcast(DT, ... ~ v2, value.var = "v3", drop = FALSE)
#       v1 ID  1  2  3  4  5  6
# 1: 1.105  1 NA  3  2 NA  2 NA
# 2: 1.105  2 NA NA NA NA NA NA
# 3: 2.012  1 NA NA NA NA NA NA
# 4: 2.012  2  5  4 NA NA NA  3

This is similar to using xtabs in base R: ftable(xtabs(v3 ~ ID + v1 + v2, DT)).

Is there a way to let dcast know that essentially, "Hey. The combination of values on the LHS are the IDs. Don't try to fill them in for me."

My current approach is to do three steps, one for collapsing down the LHS values, another for spreading out the RHS values, and then one for merging the result.

merge(DT[, list(v1 = unique(v1)), .(ID)],  ## or unique(DT[, c("ID", "v1"), with = FALSE])
      dcast(DT, ID ~ v2, value.var = "v3", drop = FALSE), 
      by = "ID")[]
#    ID    v1  1 2  3  4  5  6
# 1:  1 1.105 NA 3  2 NA  2 NA
# 2:  2 2.012  5 4 NA NA NA  3

Is there a better approach that I'm missing?

563

asked Jan 16 '16 19:01

A5C1D2H2I1M1N2O1R2T1

1 Answers

Just implemented in data.table development version v1.9.7, commit 2113, closes #1512.

require(data.table) # v1.9.7, commit 2113+
dcast(DT, ... ~ v2, value.var = "v3", drop = c(TRUE, FALSE))
#       v1 ID  1 2  3  4  5  6
# 1: 1.105  1 NA 3  2 NA  2 NA
# 2: 2.012  2  5 4 NA NA NA  3

answered Sep 29 '22 13:09

Arun

Related questions
                            
                                Send HTML message using gmailr
                            
                                How to use data.table as super class in S4
                            
                                3-D Cartesian points to 2-D hemispherical and calculate the area of 2-D Voronoi cells
                            
                                Update R version from 3.0.3 to 3.1.2
                            
                                R: Tukey posthoc tests for nnet multinom multinomial fit to test for overall differences in multinomial distribution
                            
                                R: Maps with Time Slider?
                            
                                Efficient weighted covariances in RcppEigen
                            
                                Are there any R package repository management tools?
                            
                                Is there a Python equivalent to the smooth.spline function in R
                            
                                Gitignore man directory of an R package?
                            
                                Incorporating observation weights in the randomForest package
                            
                                Unit Testing Shiny Apps
                            
                                Stitch images and data from coupled microscopy/spectroscopy into panoramic in Photoshop or R
                            
                                Reorder() not correctly reordering a factor variable in ggplot
                            
                                RStudio project and git repository in subdirectory
                            
                                Time in getting single elements from data.table and data.frame objects
                            
                                What are the restrictions on seq.int?
                            
                                Is there a datatype "Decimal" in R?
                            
                                Replace parts of a variable using numeric indices in dplyr. Do I need to create an index column and use ifelse?
                            
                                Assignment to replace value in nonlocal list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Make the `drop` argument in `dcast` only look at the RHS of the formula

Tags:

r

data.table

reshape

reshape2

A5C1D2H2I1M1N2O1R2T1

People also ask

1 Answers

Arun

Recent Activity

Donate For Us