I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent change. Here is an example with only three columns (for simplicity): <pre class="prettyprint"><code>ID Date V1 V2 V3 1 Jan 2 3 5 1 Feb 3 4 6 1 Mar 7 8 9 2 Jan 1 1 1 2 Feb 2 3 4 2 Mar 7 8 8 </code></pre> I tried to use dplyr and the summarise_each function, but that was unsuccessful. More specifically, I tried the following (train is the name of the data set): <pre class="prettyprint"><code>library(dplyr) library(quantmod) group1<-group_by(train,EXAMID) foo<-function(x){ return(Delt(x)) } summarise_each(group1,funs(foo)) </code></pre> I also tried to use the do function in dplyr, but I was not successful with that either (having a bad night I guess!). I think that the issue is the Delt function. When I replace Delt with the sum function: <pre class="prettyprint"><code>foo<-function(x){ return(sum(x)) } summarise_each(group1,funs(foo)) </code></pre> The result is that every variable is summed across the date for each ID. So how can about the percentage change month-over-month for each ID?

How about using <code>pct <- function(x) x/lag(x)</code>? (or <code>(x/lag(x)-1)*100</code>, or however you wish to specify pct change exactly) e.g., <pre class="prettyprint"><code>pct(1:3) [1] NA 2.0 1.5 </code></pre> Edit: Adding Frank's suggestion <pre class="prettyprint"><code>pct <- function(x) {x/lag(x)} dt %>% group_by(ID) %>% mutate_each(funs(pct), c(V1, V2, V3)) ID Date V1 V2 V3 1 Jan NA NA NA 1 Feb 1.500000 1.333333 1.2 1 Mar 2.333333 2.000000 1.5 2 Jan NA NA NA 2 Feb 2.000000 3.000000 4.0 2 Mar 3.500000 2.666667 2.0 </code></pre>

How can I calculate the percentage change within a group for multiple columns in R?

Tags:

r

dplyr

summarization

I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent change.

Here is an example with only three columns (for simplicity):

ID Date V1 V2 V3
1  Jan   2  3  5
1  Feb   3  4  6
1  Mar   7  8  9
2  Jan   1  1  1
2  Feb   2  3  4
2  Mar   7  8   8

I tried to use dplyr and the summarise_each function, but that was unsuccessful. More specifically, I tried the following (train is the name of the data set):

library(dplyr)
library(quantmod)

group1<-group_by(train,EXAMID)

foo<-function(x){
  return(Delt(x))
}

summarise_each(group1,funs(foo))

I also tried to use the do function in dplyr, but I was not successful with that either (having a bad night I guess!).

I think that the issue is the Delt function. When I replace Delt with the sum function:

foo<-function(x){
      return(sum(x))
    }
summarise_each(group1,funs(foo))

The result is that every variable is summed across the date for each ID. So how can about the percentage change month-over-month for each ID?

432

asked Jul 11 '15 01:07

mmmmmmmmmm

2 Answers

The issue you are running into is because your data is not formatted in a "tidy" way. You have observations (V1:V3) that are in columns creating a "wide" data frame. The "tidyverse" works best with long format. The good news is with the gather() function you can get exactly what you need. Here's a solution using the "tidyverse".

library(tidyverse)

# Recreate data set
df <- tribble(
    ~ID, ~Date, ~V1, ~V2, ~V3,
    1,  "Jan",   2,  3,  5,
    1,  "Feb",   3,  4,  6,
    1,  "Mar",   7,  8,  9,
    2,  "Jan",   1,  1,  1,
    2,  "Feb",   2,  3,  4,
    2,  "Mar",   7,  8,  8
)
df
#> # A tibble: 6 Ã— 5
#>      ID  Date    V1    V2    V3
#>   <dbl> <chr> <dbl> <dbl> <dbl>
#> 1     1   Jan     2     3     5
#> 2     1   Feb     3     4     6
#> 3     1   Mar     7     8     9
#> 4     2   Jan     1     1     1
#> 5     2   Feb     2     3     4
#> 6     2   Mar     7     8     8

# Gather and calculate percent change
df %>%
    gather(key = key, value = value, V1:V3) %>%
    group_by(ID, key) %>%
    mutate(lag = lag(value)) %>%
    mutate(pct.change = (value - lag) / lag)
#> Source: local data frame [18 x 6]
#> Groups: ID, key [6]
#> 
#>       ID  Date   key value   lag pct.change
#>    <dbl> <chr> <chr> <dbl> <dbl>      <dbl>
#> 1      1   Jan    V1     2    NA         NA
#> 2      1   Feb    V1     3     2  0.5000000
#> 3      1   Mar    V1     7     3  1.3333333
#> 4      2   Jan    V1     1    NA         NA
#> 5      2   Feb    V1     2     1  1.0000000
#> 6      2   Mar    V1     7     2  2.5000000
#> 7      1   Jan    V2     3    NA         NA
#> 8      1   Feb    V2     4     3  0.3333333
#> 9      1   Mar    V2     8     4  1.0000000
#> 10     2   Jan    V2     1    NA         NA
#> 11     2   Feb    V2     3     1  2.0000000
#> 12     2   Mar    V2     8     3  1.6666667
#> 13     1   Jan    V3     5    NA         NA
#> 14     1   Feb    V3     6     5  0.2000000
#> 15     1   Mar    V3     9     6  0.5000000
#> 16     2   Jan    V3     1    NA         NA
#> 17     2   Feb    V3     4     1  3.0000000
#> 18     2   Mar    V3     8     4  1.0000000

134

answered Sep 19 '22 19:09

Matt Dancho

How about using pct <- function(x) x/lag(x)? (or (x/lag(x)-1)*100, or however you wish to specify pct change exactly) e.g.,

pct(1:3)
[1]  NA 2.0 1.5

Edit: Adding Frank's suggestion

pct <- function(x) {x/lag(x)}

dt %>% group_by(ID) %>% mutate_each(funs(pct), c(V1, V2, V3))

ID Date       V1       V2  V3
1  Jan       NA       NA  NA
1  Feb 1.500000 1.333333 1.2
1  Mar 2.333333 2.000000 1.5
2  Jan       NA       NA  NA
2  Feb 2.000000 3.000000 4.0
2  Mar 3.500000 2.666667 2.0

answered Sep 22 '22 19:09

dzeltzer

Related questions
                            
                                Storing multiple data frames into one data structure - R
                            
                                How to calculate first derivative of time series
                            
                                Boxplots for groups?
                            
                                Plot histograms over factor variables
                            
                                Adding a color legend to an image
                            
                                Replace values in data frame with other values according to a rule
                            
                                Add a transparent window/keyhole ggplot2 (grid)
                            
                                Using R to download zipped data file, extract, and import .csv
                            
                                .onLoad failed in loadNamespace() for 'rJava' when installing a package
                            
                                Create SpatialPointsDataframe
                            
                                Passing data within Shiny Modules from Module 1 to Module 2
                            
                                Decrease overal legend size (elements and text)
                            
                                Getting the state of variables after an error occurs in R
                            
                                An NA in subsetting a data.frame does something unexpected
                            
                                Intersection of lists in R
                            
                                generate markdown comments within for loop
                            
                                R function to return the license of a package?
                            
                                counting occurrences in data.frame in r
                            
                                Convert time from numeric to time format in R
                            
                                Constructing a named list without having to type each object's name twice [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With