I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent change.
Here is an example with only three columns (for simplicity):
ID Date V1 V2 V3
1 Jan 2 3 5
1 Feb 3 4 6
1 Mar 7 8 9
2 Jan 1 1 1
2 Feb 2 3 4
2 Mar 7 8 8
I tried to use dplyr and the summarise_each function, but that was unsuccessful. More specifically, I tried the following (train is the name of the data set):
library(dplyr)
library(quantmod)
group1<-group_by(train,EXAMID)
foo<-function(x){
return(Delt(x))
}
summarise_each(group1,funs(foo))
I also tried to use the do function in dplyr, but I was not successful with that either (having a bad night I guess!).
I think that the issue is the Delt function. When I replace Delt with the sum function:
foo<-function(x){
return(sum(x))
}
summarise_each(group1,funs(foo))
The result is that every variable is summed across the date for each ID. So how can about the percentage change month-over-month for each ID?
To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .
To find the percentage difference in excel, first, find the difference between the two numbers and divide this difference with the base value. After obtaining the results, multiply the decimal number by 100; this result will represent the percentage difference.
To calculate the percentage difference between two numbers, a and b , perform the following calculations: Find the absolute difference between two numbers: |a - b| Find the average of those two numbers: (a + b) / 2. Divide the difference by the average: |a - b| / ((a + b) / 2)
The issue you are running into is because your data is not formatted in a "tidy" way. You have observations (V1:V3) that are in columns creating a "wide" data frame. The "tidyverse" works best with long format. The good news is with the gather()
function you can get exactly what you need. Here's a solution using the "tidyverse".
library(tidyverse)
# Recreate data set
df <- tribble(
~ID, ~Date, ~V1, ~V2, ~V3,
1, "Jan", 2, 3, 5,
1, "Feb", 3, 4, 6,
1, "Mar", 7, 8, 9,
2, "Jan", 1, 1, 1,
2, "Feb", 2, 3, 4,
2, "Mar", 7, 8, 8
)
df
#> # A tibble: 6 × 5
#> ID Date V1 V2 V3
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 Jan 2 3 5
#> 2 1 Feb 3 4 6
#> 3 1 Mar 7 8 9
#> 4 2 Jan 1 1 1
#> 5 2 Feb 2 3 4
#> 6 2 Mar 7 8 8
# Gather and calculate percent change
df %>%
gather(key = key, value = value, V1:V3) %>%
group_by(ID, key) %>%
mutate(lag = lag(value)) %>%
mutate(pct.change = (value - lag) / lag)
#> Source: local data frame [18 x 6]
#> Groups: ID, key [6]
#>
#> ID Date key value lag pct.change
#> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 Jan V1 2 NA NA
#> 2 1 Feb V1 3 2 0.5000000
#> 3 1 Mar V1 7 3 1.3333333
#> 4 2 Jan V1 1 NA NA
#> 5 2 Feb V1 2 1 1.0000000
#> 6 2 Mar V1 7 2 2.5000000
#> 7 1 Jan V2 3 NA NA
#> 8 1 Feb V2 4 3 0.3333333
#> 9 1 Mar V2 8 4 1.0000000
#> 10 2 Jan V2 1 NA NA
#> 11 2 Feb V2 3 1 2.0000000
#> 12 2 Mar V2 8 3 1.6666667
#> 13 1 Jan V3 5 NA NA
#> 14 1 Feb V3 6 5 0.2000000
#> 15 1 Mar V3 9 6 0.5000000
#> 16 2 Jan V3 1 NA NA
#> 17 2 Feb V3 4 1 3.0000000
#> 18 2 Mar V3 8 4 1.0000000
How about using
pct <- function(x) x/lag(x)
? (or (x/lag(x)-1)*100
, or however you wish to specify pct change exactly)
e.g.,
pct(1:3)
[1] NA 2.0 1.5
Edit: Adding Frank's suggestion
pct <- function(x) {x/lag(x)}
dt %>% group_by(ID) %>% mutate_each(funs(pct), c(V1, V2, V3))
ID Date V1 V2 V3
1 Jan NA NA NA
1 Feb 1.500000 1.333333 1.2
1 Mar 2.333333 2.000000 1.5
2 Jan NA NA NA
2 Feb 2.000000 3.000000 4.0
2 Mar 3.500000 2.666667 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With