In tidyr, what criteria does the function `gather` use to map a dataframe from wide to long?

Tags:

I'm trying to figure out the arguments for gather in the tidyr package.

I looked at the documentation, and the syntax looks like:

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)

There is an example in the help files:

stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, stock, price, -time)

I'm curious about the last line:
gather(stocks, stock, price, -time)

Here, stocks is clearly the data we want to modify, which is fine.

So I can read that stock and price are arguments to a key value pair -- but how does this function decide how to select columns to create this key value pair? The original dataframe looks like this:

time        X            Y          Z
2009-01-01  1.10177950  -1.1926213  -7.4149618
2009-01-02  0.75578151  -4.3705737  -0.3117843
2009-01-03  -0.23823356 -1.3497319  3.8742654
2009-01-04  0.98744470  -4.2381224  0.7397038
2009-01-05  0.74139013  -2.5303960  -5.5197743

I don't see any indication that we should use any combination of X, Y or Z. When I'm using this function, I feel like I'm just choosing names for what I want the columns in my long formatted dataframe to be, and praying that gather magically works. Come to think of it, I feel the same way when I use melt.

Does gather look at the column's type? How does it map from wide to long?

EDIT Great answer below, great discussion below, and for anyone else wanting more info on the philosophy and use of the tidyr package should definitely read this paper, although the vignette doesn't explain the syntax.

644

asked Jan 25 '15 05:01

tumultous_rooster

1 Answers

In "tidyr", you specify the measure variables for gather in the ... argument. This is a little bit different conceptually from melt, where many examples (even many answers here on SO) would show the use of the id.vars argument (with the assumption that anything that is not specified as an ID is a measurement).

The ... argument can also take a - column name, as in the example you have shown. This basically says to "gather all of the columns except for this one". Another shorthand approach in gather includes specifying a range of columns by using the colon, for example, gather(stocks, stock, price, X:Z).

You can compare gather with melt by looking at the code for the function. Here are the first few lines:

> tidyr:::gather_.data.frame
function (data, key_col, value_col, gather_cols, na.rm = FALSE, 
    convert = FALSE) 
{
    data2 <- reshape2::melt(data, measure.vars = gather_cols, 
        variable.name = key_col, value.name = value_col, na.rm = na.rm)

answered Nov 01 '22 18:11

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Replace NA with last non-NA in data.table by using only data.table
                            
                                make sure graphics device gets closed
                            
                                Compare consecutive rows in data.table and replace row values
                            
                                How do I set the value of a specific cell using xlsx (R package)?
                            
                                R fread data.table inconsistent speed
                            
                                How do I anchor one side of axis limits? [duplicate]
                            
                                Does R allows operators to be compound expressions?
                            
                                R: efficiently grep characters in rows of large data.frame
                            
                                R ggplot2 strange behaviour. It looks it's passing by reference
                            
                                Single bar barchart in ggplot2, R
                            
                                Why doesn't 'with' pass variable scope through nested functions?
                            
                                integrate() in R gives terribly wrong answer
                            
                                How to avoid 'sink stack is full' error when sink() is used to capture messages in foreach loop
                            
                                Collinearity after accounting for random/mixed effects
                            
                                Change decimal digits for data frame column in R
                            
                                Sorting month chronologicaly with arrange() from dplyr
                            
                                mean(rnorm(100,mean=0,sd=1)) is not 0; and sd(rnorm(100,mean=0,sd=1)) is not 1. Why?
                            
                                Selecting with dplyr by parameters in column names
                            
                                Car package not found by R (failed to load)
                            
                                How to change the width of area around SelectInput in R shiny

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In tidyr, what criteria does the function `gather` use to map a dataframe from wide to long?

Tags:

dataframe

r

tidyr

reshape2

tumultous_rooster

People also ask

1 Answers

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us