Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate relative change in time by group

I am struggeling to calculate the percent difference between the annual net sales for a company, with taken into account NA's.

Here's an sample of the data:

dt <- data.table(lpermno = c(10065, 10065, 10065, 10065, 59328, 61241, 59328, 61241, 59328, 61241, 59328, 61241), fyear = c(2001, 2002, 2003, 2004, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004), sale = c(NA, NA, NA, NA, 26539, 3891.754, 26764, 2697.029, 30141, 3519.168, 34209, 5001.435))

lpermno fyear sale
10065   2001    NA
10065   2002    NA
10065   2003    NA
10065   2004    NA
59328   2001    26539.000
59328   2002    26764.000
59328   2003    30141.000
59328   2004    34209.000
61241   2001    3891.754
61241   2002    2697.029
61241   2003    3519.168
61241   2004    5001.435

I'd like to calculate a new variable, called sales_change. This variable should be the percentage change for sale. [sale_n]/[sale_n-1] for each company. For the first observation of a company, the sales_change needs to be just 1.

I've read the following posts for guidance, but it didn't work out.

  1. Calculate first difference by group in R
    • calculate difference not percentage change
  2. Calculate difference between values by group and matched for time
    • Get only na's as output after adjusting the code
  3. R: how to find percent diff between columns and naming accordingly?
    • This code returns new columns for the different sales periods, as I am working with many years it is not preferable
  4. Calculate percentage change in an R data frame
  5. How to calculate percentage change from different rows over different spans
  6. Calculate relative changes in a time series with respect to a baseline by group. NA if no baseline value was measured
    • Works with a baseline, and that's not what I am looking for
  7. Rolling mean (moving average) by group/id with dplyr
    • Looks like an elegant solution for calculating the mean, however I am looking for the percentage change.

For the example data i gave above, the desired output would be:

output <- data.table(lpermno = c(10065, 10065, 10065, 10065, 59328, 59328, 59328, 59328, 61241, 61241, 61241, 61241), fyear = c(2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004), sale = c(NA, NA, NA, NA, 3891.754, 2697.029, 3519.168, 5001.435, 26539, 26764, 30141, 34209), output = c(NA, NA, NA, NA, 1, 0.693011, 1.304831, 1.421198, 1, 1.008478, 1.126177, 1.134966))

lpermno fyear sale output
10065   2001    NA  NA
10065   2002    NA  NA
10065   2003    NA  NA
10065   2004    NA  NA
59328   2001    3891.754    1.000000
59328   2002    2697.029    0.693011
59328   2003    3519.168    1.304831
59328   2004    5001.435    1.421198
61241   2001    26539.000   1.000000
61241   2002    26764.000   1.008478
61241   2003    30141.000   1.126177
61241   2004    34209.000   1.134966

I'd appreciate some assistance. Thanks in advance.

like image 692
Patrick Avatar asked Jun 23 '19 12:06

Patrick


People also ask

How do you calculate the relative change?

relative change = absolute change reference value = new value − reference value reference value . 100 100 = 100%. When a quantity triples in value, its relative change is 2 = 200%. When a quantity quadruples in value, its relative change is 3 = 300%.

How do you calculate relative lift?

Relative brand liftThe difference in positive responses to brand or product surveys between users who saw your ads, versus users who were withheld from seeing your ads. This difference is then divided by the number of positive responses from the group of users who didn't see your ads.

How do you calculate relative percentage difference?

How to calculate the Relative Percent Difference (RPD)The basic equation for RPD isR1 and R2 are your sample and duplicate values. Basically, this equation has you calculate the RPD by dividing the difference between the sample and duplicate by the average of the two.


1 Answers

Using data.table, you could do the following:

dt[, pctchnge := sale / c(sale[1], head(sale, -1)), by="lpermno"][order(lpermno)]

Here, create a new variable with :=, repeat the first month of sales, and drop the final month with tail for the denominator. perform calculations by lpermno. Then sort by lpermno.

this returns

    lpermno fyear      sale  pctchnge
 1:   10065  2001        NA        NA
 2:   10065  2002        NA        NA
 3:   10065  2003        NA        NA
 4:   10065  2004        NA        NA
 5:   59328  2001 26539.000 1.0000000
 6:   59328  2002 26764.000 1.0084781
 7:   59328  2003 30141.000 1.1261770
 8:   59328  2004 34209.000 1.1349657
 9:   61241  2001  3891.754 1.0000000
10:   61241  2002  2697.029 0.6930112
11:   61241  2003  3519.168 1.3048314
12:   61241  2004  5001.435 1.4211981
like image 143
lmo Avatar answered Nov 03 '22 18:11

lmo