How to melt and cast dataframes using dplyr?

Tags:

Recently I am doing all my data manipulations using dplyr and it is an excellent tool for that. However I am unable to melt or cast a data frame using dplyr. Is there any way to do that? Right now I am using reshape2 for this purpose.

I want 'dplyr' solution for:

require(reshape2) data(iris) dat <- melt(iris,id.vars="Species")

702

asked Jul 22 '14 07:07

Koundy

2 Answers

The successor to reshape2 is tidyr. The equivalent of melt() and dcast() are gather() and spread() respectively. The equivalent to your code would then be

library(tidyr) data(iris) dat <- gather(iris, variable, value, -Species)

If you have magrittr imported you can use the pipe operator like in dplyr, i.e. write

dat <- iris %>% gather(variable, value, -Species)

Note that you need to specify the variable and value names explicitly, unlike in melt(). I find the syntax of gather() quite convenient, because you can just specify the columns you want to be converted to long format, or specify the ones you want to remain in the new data frame by prefixing them with '-' (just like for Species above), which is a bit faster to type than in melt(). However, I've noticed that on my machine at least, tidyr can be noticeably slower than reshape2.

Edit In reply to @hadley 's comment below, I'm posting some timing info comparing the two functions on my PC.

library(microbenchmark) microbenchmark(     melt = melt(iris,id.vars="Species"),      gather = gather(iris, variable, value, -Species) ) # Unit: microseconds #    expr     min       lq  median       uq      max neval #    melt 278.829 290.7420 295.797 320.5730  389.626   100 #  gather 536.974 552.2515 567.395 683.2515 1488.229   100  set.seed(1) iris1 <- iris[sample(1:nrow(iris), 1e6, replace = T), ]  system.time(melt(iris1,id.vars="Species")) #    user  system elapsed  #   0.012   0.024   0.036  system.time(gather(iris1, variable, value, -Species)) #    user  system elapsed  #   0.364   0.024   0.387   sessionInfo() # R version 3.1.1 (2014-07-10) # Platform: x86_64-pc-linux-gnu (64-bit) #  # locale: #  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               #  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     #  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    #  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  #  [9] LC_ADDRESS=C               LC_TELEPHONE=C             # [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C         # attached base packages: # [1] stats     graphics  grDevices utils     datasets  methods   base      #  # other attached packages: # [1] reshape2_1.4         microbenchmark_1.3-0 magrittr_1.0.1       # [4] tidyr_0.1            #  # loaded via a namespace (and not attached): # [1] assertthat_0.1 dplyr_0.2      parallel_3.1.1 plyr_1.8.1     Rcpp_0.11.2    # [6] stringr_0.6.2  tools_3.1.1

101

answered Sep 22 '22 02:09

konvas

In addition, cast can be using tidyr::spread()

Example for you

library(reshape2) library(tidyr) library(dplyr)  # example data : `mini_iris` (mini_iris <- iris[c(1, 51, 101), ])  # melt (melted1 <- mini_iris %>% melt(id.vars = "Species"))         # on reshape2 (melted2 <- mini_iris %>% gather(variable, value, -Species)) # on tidyr  # cast melted1 %>% dcast(Species ~ variable, value.var = "value") # on reshape2 melted2 %>% spread(variable, value)                        # on tidyr

answered Sep 23 '22 02:09

Lovetoken

Related questions
                            
                                Convert radians to degree / degree to radians
                            
                                Add a row by reference at the end of a data.table object
                            
                                Using 3rd party header files with Rcpp
                            
                                Operator "[<-" in RStudio and R
                            
                                Dealing with missing values for correlations calculation
                            
                                Referring to data.table columns by names saved in variables
                            
                                Difference between as.data.frame(x) and data.frame(x)
                            
                                How to change the order of facet labels in ggplot (custom facet wrap labels)
                            
                                What evaluates to True/False in R?
                            
                                add "floating" axis labels in facet_wrap plot
                            
                                Time out an R command via something like try()
                            
                                How to test that an error does not occur?
                            
                                Controlling both the major and minor grid lines on the y axis
                            
                                no visible global function definition for ‘median’
                            
                                How to create example data set from private data (replacing variable names and levels with uninformative place holders)?
                            
                                Generate a set of random unique integers from an interval
                            
                                Convert data from long format to wide format with multiple measure columns
                            
                                How do I obtain the machine epsilon in R?
                            
                                data.table "key indices" or "group counter"
                            
                                R find time when a file was created

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to melt and cast dataframes using dplyr?

Tags:

r

dplyr

reshape

melt

Koundy

People also ask

2 Answers

konvas

Lovetoken

Recent Activity

Donate For Us