Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing gather (tidyr) to melt (reshape2)

Tags:

r

tidyr

reshape2

I love the reshape2 package because it made life so doggone easy. Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code. I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2. But after reading the documentation I can't get gather to do the same task that melt does.

Data View

Here's a view of the data (actual data in dput form at end of post):

  teacher yr1.baseline     pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3 1       3      1/13/09 2/5/09      3/6/09     4/27/09     10/7/09    11/18/09      3/4/10 2       7      1/15/09 2/5/09      3/3/09      5/5/09    10/16/09    11/18/09      3/4/10 3       8      1/27/09 2/5/09      3/3/09     4/27/09     10/7/09    11/18/09      3/5/10 

Code

Here's the code in melt fashion, my attempt at gather. How can I make gather do the same thing as melt?

library(reshape2); library(dplyr); library(tidyr)  dat %>%     melt(id=c("teacher", "pd"), value.name="date")   dat %>%     gather(key=c(teacher, pd), value=date, -c(teacher, pd))  

Desired Output

   teacher     pd     variable     date 1        3 2/5/09 yr1.baseline  1/13/09 2        7 2/5/09 yr1.baseline  1/15/09 3        8 2/5/09 yr1.baseline  1/27/09 4        3 2/5/09  yr1.lesson1   3/6/09 5        7 2/5/09  yr1.lesson1   3/3/09 6        8 2/5/09  yr1.lesson1   3/3/09 7        3 2/5/09  yr1.lesson2  4/27/09 8        7 2/5/09  yr1.lesson2   5/5/09 9        8 2/5/09  yr1.lesson2  4/27/09 10       3 2/5/09  yr2.lesson1  10/7/09 11       7 2/5/09  yr2.lesson1 10/16/09 12       8 2/5/09  yr2.lesson1  10/7/09 13       3 2/5/09  yr2.lesson2 11/18/09 14       7 2/5/09  yr2.lesson2 11/18/09 15       8 2/5/09  yr2.lesson2 11/18/09 16       3 2/5/09  yr2.lesson3   3/4/10 17       7 2/5/09  yr2.lesson3   3/4/10 18       8 2/5/09  yr2.lesson3   3/5/10 

Data

dat <- structure(list(teacher = structure(1:3, .Label = c("3", "7",      "8"), class = "factor"), yr1.baseline = structure(1:3, .Label = c("1/13/09",      "1/15/09", "1/27/09"), class = "factor"), pd = structure(c(1L,      1L, 1L), .Label = "2/5/09", class = "factor"), yr1.lesson1 = structure(c(2L,      1L, 1L), .Label = c("3/3/09", "3/6/09"), class = "factor"), yr1.lesson2 = structure(c(1L,      2L, 1L), .Label = c("4/27/09", "5/5/09"), class = "factor"),          yr2.lesson1 = structure(c(2L, 1L, 2L), .Label = c("10/16/09",          "10/7/09"), class = "factor"), yr2.lesson2 = structure(c(1L,          1L, 1L), .Label = "11/18/09", class = "factor"), yr2.lesson3 = structure(c(1L,          1L, 2L), .Label = c("3/4/10", "3/5/10"), class = "factor")), .Names = c("teacher",      "yr1.baseline", "pd", "yr1.lesson1", "yr1.lesson2", "yr2.lesson1",      "yr2.lesson2", "yr2.lesson3"), row.names = c(NA, -3L), class = "data.frame") 
like image 648
Tyler Rinker Avatar asked Oct 23 '14 19:10

Tyler Rinker


People also ask

What does it mean to melt a dataset?

Melting in R programming is done to organize the data. It is performed using melt() function which takes dataset and column values that has to be kept constant. Using melt(), dataframe is converted into long format and stretches the data frame.

What does melt in R mean?

The melt() function in R programming is an in-built function. It enables us to reshape and elongate the data frames in a user-defined manner. It organizes the data values in a long data frame format.

What is melt in statistics?

The melt() function is used to convert a data frame with several measurement columns into a data frame in this canonical format, which has one row for every observed (measured) value. Let's melt data frame about states, with eight observations per row.

Which R package is melt in?

The melt function is to be found in the reshape package. If you do not have that package installed, then you will need to install it with install. packages("reshape") before you can use it. Then, when the package is installed, make it available with library(reshape) .


1 Answers

Your gather line should look like:

dat %>% gather(variable, date, -teacher, -pd) 

This says "Gather all variables except teacher and pd, calling the new key column 'variable' and the new value column 'date'."


As an explanation, note the following from the help(gather) page:

 ...: Specification of columns to gather. Use bare variable names.       Select all variables between x and z with ‘x:z’, exclude y       with ‘-y’. For more options, see the select documentation. 

Since this is an ellipsis, the specification of columns to gather is given as separate (bare name) arguments. We wish to gather all columns except teacher and pd, so we use -.

like image 66
David Robinson Avatar answered Oct 28 '22 22:10

David Robinson