Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to melt and cast dataframes using dplyr?

Recently I am doing all my data manipulations using dplyr and it is an excellent tool for that. However I am unable to melt or cast a data frame using dplyr. Is there any way to do that? Right now I am using reshape2 for this purpose.

I want 'dplyr' solution for:

require(reshape2) data(iris) dat <- melt(iris,id.vars="Species") 
like image 702
Koundy Avatar asked Jul 22 '14 07:07

Koundy


People also ask

How do you melt a DataFrame in R?

Melting in R It is performed using melt() function which takes dataset and column values that has to be kept constant. Using melt(), dataframe is converted into long format and stretches the data frame.

What does melt () do in R?

The melt() function in R programming is an in-built function. It enables us to reshape and elongate the data frames in a user-defined manner. It organizes the data values in a long data frame format.

Does Dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.

What is melt in Ggplot?

That is why, for example, the ggplot package requires that your data is in melted format. The melt() function is used to convert a data frame with several measurement columns into a data frame in this canonical format, which has one row for every observed (measured) value.


2 Answers

The successor to reshape2 is tidyr. The equivalent of melt() and dcast() are gather() and spread() respectively. The equivalent to your code would then be

library(tidyr) data(iris) dat <- gather(iris, variable, value, -Species) 

If you have magrittr imported you can use the pipe operator like in dplyr, i.e. write

dat <- iris %>% gather(variable, value, -Species) 

Note that you need to specify the variable and value names explicitly, unlike in melt(). I find the syntax of gather() quite convenient, because you can just specify the columns you want to be converted to long format, or specify the ones you want to remain in the new data frame by prefixing them with '-' (just like for Species above), which is a bit faster to type than in melt(). However, I've noticed that on my machine at least, tidyr can be noticeably slower than reshape2.

Edit In reply to @hadley 's comment below, I'm posting some timing info comparing the two functions on my PC.

library(microbenchmark) microbenchmark(     melt = melt(iris,id.vars="Species"),      gather = gather(iris, variable, value, -Species) ) # Unit: microseconds #    expr     min       lq  median       uq      max neval #    melt 278.829 290.7420 295.797 320.5730  389.626   100 #  gather 536.974 552.2515 567.395 683.2515 1488.229   100  set.seed(1) iris1 <- iris[sample(1:nrow(iris), 1e6, replace = T), ]  system.time(melt(iris1,id.vars="Species")) #    user  system elapsed  #   0.012   0.024   0.036  system.time(gather(iris1, variable, value, -Species)) #    user  system elapsed  #   0.364   0.024   0.387   sessionInfo() # R version 3.1.1 (2014-07-10) # Platform: x86_64-pc-linux-gnu (64-bit) #  # locale: #  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               #  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     #  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    #  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  #  [9] LC_ADDRESS=C               LC_TELEPHONE=C             # [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C         # attached base packages: # [1] stats     graphics  grDevices utils     datasets  methods   base      #  # other attached packages: # [1] reshape2_1.4         microbenchmark_1.3-0 magrittr_1.0.1       # [4] tidyr_0.1            #  # loaded via a namespace (and not attached): # [1] assertthat_0.1 dplyr_0.2      parallel_3.1.1 plyr_1.8.1     Rcpp_0.11.2    # [6] stringr_0.6.2  tools_3.1.1    
like image 101
konvas Avatar answered Sep 22 '22 02:09

konvas


In addition, cast can be using tidyr::spread()

Example for you

library(reshape2) library(tidyr) library(dplyr)  # example data : `mini_iris` (mini_iris <- iris[c(1, 51, 101), ])  # melt (melted1 <- mini_iris %>% melt(id.vars = "Species"))         # on reshape2 (melted2 <- mini_iris %>% gather(variable, value, -Species)) # on tidyr  # cast melted1 %>% dcast(Species ~ variable, value.var = "value") # on reshape2 melted2 %>% spread(variable, value)                        # on tidyr 
like image 22
Lovetoken Avatar answered Sep 23 '22 02:09

Lovetoken