In the help files for <code>dcast.data.table</code>, there is a note stating that a new feature has been implemented: "dcast.data.table allows value.var column to be of type list" I take this to mean that one can have multiple value variables within a list, i.e. in this format: <pre class="prettyprint"><code>dcast.data.table(dt, x1~x2, value.var=list('var1','var2','var3')) </code></pre> But we get an error: <code>'value.var' must be a character vector of length 1.</code> Is there such a feature, and if not, what would be other one-liner alternatives? EDIT: In reply to the comments below There are situations where you have multiple variables that you want to treat as the <code>value.var</code>. Imagine for example that x2 consists of 3 different weeks, and you have 2 value variables such as salt and sugar consumption and you want to cast those variables across the different weeks. Sure, you can 'melt' the 2 value variables into a single column, but why do something using two functions, when you can do it in one function like <code>reshape</code> does? (Note: I've also noticed that <code>reshape</code> cannot treat multiple variables as the time variable as <code>dcast</code> does.) So my point is that I don't understand why these functions don't allow for the flexibility to include multiple variables within the <code>value.var</code> or the <code>time.var</code> just as we allow for multiple variables for the <code>id.var</code>.

<h3>Update</h3> Apparently, the fix was much easier... <hr> Technically, your statement that "apparently there is no such feature" isn't quite correct. There is such a feature in the <code>recast</code> function (which sort of hides the melting and casting process), but it seems like Hadley forgot to finish the function or something: the function returns a <code>list</code> of the relevant parts of your operation. Here's a minimal example... Some sample data: <pre class="prettyprint"><code>set.seed(1) mydf <- data.frame(x1 = rep(1:3, each = 3), x2 = rep(1:3, 3), salt = sample(10, 9, TRUE), sugar = sample(7, 9, TRUE)) mydf # x1 x2 salt sugar # 1 1 1 3 1 # 2 1 2 4 2 # 3 1 3 6 2 # 4 2 1 10 5 # 5 2 2 3 3 # 6 2 3 9 6 # 7 3 1 10 4 # 8 3 2 7 6 # 9 3 3 7 7 </code></pre> The effect you seem to be trying to achieve: <pre class="prettyprint"><code>reshape(mydf, idvar='x1', timevar='x2', direction='wide') # x1 salt.1 sugar.1 salt.2 sugar.2 salt.3 sugar.3 # 1 1 3 1 4 2 6 2 # 4 2 10 5 3 3 9 6 # 7 3 10 4 7 6 7 7 </code></pre> <code>recast</code> in action. (Note that the values are all what we would expect in the dimensions we would expect it.) <pre class="prettyprint"><code>library(reshape2) out <- recast(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar")) ### recast(mydf, x1 ~ x2 + variable, id.var = c("x1", "x2")) out # $data # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 3 1 4 2 6 2 # [2,] 10 5 3 3 9 6 # [3,] 10 4 7 6 7 7 # # $labels # $labels[[1]] # x1 # 1 1 # 2 2 # 3 3 # # $labels[[2]] # x2 variable # 1 1 salt # 2 1 sugar # 3 2 salt # 4 2 sugar # 5 3 salt # 6 3 sugar </code></pre> I'm honestly not sure if this was an incomplete function, or if it is a helper function to another function. All of the information is there to be able to put the data back together again, making it easy to write a function like this: <pre class="prettyprint"><code>recast2 <- function(...) { inList <- recast(...) setNames(cbind(inList[[2]][[1]], inList[[1]]), c(names(inList[[2]][[1]]), do.call(paste, c(rev(inList[[2]][[2]]), sep = "_")))) } recast2(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar")) # x1 salt_1 sugar_1 salt_2 sugar_2 salt_3 sugar_3 # 1 1 3 1 4 2 6 2 # 2 2 10 5 3 3 9 6 # 3 3 10 4 7 6 7 7 </code></pre> Again, a possible advantage with the <code>recast2</code> approach is the ability to aggregate as well as reshape in the same step.

can the value.var in dcast be a list or have multiple value variables?

Tags:

r

data.table

reshape

reshape2

In the help files for dcast.data.table, there is a note stating that a new feature has been implemented: "dcast.data.table allows value.var column to be of type list"

I take this to mean that one can have multiple value variables within a list, i.e. in this format:

dcast.data.table(dt, x1~x2, value.var=list('var1','var2','var3'))

But we get an error: 'value.var' must be a character vector of length 1.

Is there such a feature, and if not, what would be other one-liner alternatives?

EDIT: In reply to the comments below

There are situations where you have multiple variables that you want to treat as the value.var. Imagine for example that x2 consists of 3 different weeks, and you have 2 value variables such as salt and sugar consumption and you want to cast those variables across the different weeks. Sure, you can 'melt' the 2 value variables into a single column, but why do something using two functions, when you can do it in one function like reshape does?

(Note: I've also noticed that reshape cannot treat multiple variables as the time variable as dcast does.)

So my point is that I don't understand why these functions don't allow for the flexibility to include multiple variables within the value.var or the time.var just as we allow for multiple variables for the id.var.

340

asked Apr 14 '14 09:04

AlexR

2 Answers

From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

Here's how we could use dcast:

dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar")) #    x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3 # 1:  1      3      4      6       1       2       2 # 2:  2     10      3      9       5       3       6 # 3:  3     10      7      7       4       6       7

166

answered Oct 02 '22 10:10

Arun

Update

Apparently, the fix was much easier...

Technically, your statement that "apparently there is no such feature" isn't quite correct. There is such a feature in the recast function (which sort of hides the melting and casting process), but it seems like Hadley forgot to finish the function or something: the function returns a list of the relevant parts of your operation.

Here's a minimal example...

Some sample data:

set.seed(1) mydf <- data.frame(x1 = rep(1:3, each = 3),                    x2 = rep(1:3, 3),                    salt = sample(10, 9, TRUE),                    sugar = sample(7, 9, TRUE))  mydf #   x1 x2 salt sugar # 1  1  1    3     1 # 2  1  2    4     2 # 3  1  3    6     2 # 4  2  1   10     5 # 5  2  2    3     3 # 6  2  3    9     6 # 7  3  1   10     4 # 8  3  2    7     6 # 9  3  3    7     7

The effect you seem to be trying to achieve:

reshape(mydf, idvar='x1', timevar='x2', direction='wide') #   x1 salt.1 sugar.1 salt.2 sugar.2 salt.3 sugar.3 # 1  1      3       1      4       2      6       2 # 4  2     10       5      3       3      9       6 # 7  3     10       4      7       6      7       7

recast in action. (Note that the values are all what we would expect in the dimensions we would expect it.)

library(reshape2) out <- recast(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar")) ### recast(mydf, x1 ~ x2 + variable, id.var = c("x1", "x2")) out # $data #      [,1] [,2] [,3] [,4] [,5] [,6] # [1,]    3    1    4    2    6    2 # [2,]   10    5    3    3    9    6 # [3,]   10    4    7    6    7    7 #  # $labels # $labels[[1]] #   x1 # 1  1 # 2  2 # 3  3 #  # $labels[[2]] #   x2 variable # 1  1     salt # 2  1    sugar # 3  2     salt # 4  2    sugar # 5  3     salt # 6  3    sugar

I'm honestly not sure if this was an incomplete function, or if it is a helper function to another function.

All of the information is there to be able to put the data back together again, making it easy to write a function like this:

recast2 <- function(...) {   inList <- recast(...)   setNames(cbind(inList[[2]][[1]], inList[[1]]),            c(names(inList[[2]][[1]]),               do.call(paste, c(rev(inList[[2]][[2]]), sep = "_")))) } recast2(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar")) #   x1 salt_1 sugar_1 salt_2 sugar_2 salt_3 sugar_3 # 1  1      3       1      4       2      6       2 # 2  2     10       5      3       3      9       6 # 3  3     10       4      7       6      7       7

Again, a possible advantage with the recast2 approach is the ability to aggregate as well as reshape in the same step.

answered Oct 02 '22 09:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Count number of distinct values in a vector
                            
                                R: Plotting a 3D surface from x, y, z
                            
                                developing shiny app as a package and deploying it to shiny server
                            
                                cut function in R- labeling without scientific notations for use in ggplot2
                            
                                Which library could be used to make a Chord diagram in R? [closed]
                            
                                Plotting contours on an irregular grid
                            
                                Producing a vector graphics image (i.e. metafile) in R suitable for printing in Word 2007
                            
                                R: get list of files but not of directories
                            
                                defining minimum point size in ggplot2 - geom_point
                            
                                "Erroneous nesting of equation structures" in using "\begin{align}" in a multi-line equation in rmarkdown to knit+pandoc pdf
                            
                                How to load data quickly into R?
                            
                                Wrap legend text in ggplot2
                            
                                "un-register" a doParallel cluster
                            
                                r get value only from quantile() function
                            
                                Reproducing lattice dendrogram graph with ggplot2
                            
                                How to filter for unique combination of columns from an R dataframe
                            
                                Using If/Else on a data frame
                            
                                Passing a variable name to a function in R
                            
                                How can I learn to create beautiful infographics (with connection to my R knowledge)? [closed]
                            
                                Cut by Defined Interval

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With