Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Percentage Change in R using dplyr

Tags:

r

dplyr

I want to calculate the percentage of Profit by YEAR which is a fairly simple task but somehow I am getting NA. I have checked same questions asked before but I'm not able to understand why I am getting NA. The data is as follows:

> df_vertical_growth
   YEAR                        VERTICAL   Profit pct_change
1  2017                     AGRICULTURE        0         NA
2  2016                     AGRICULTURE  2053358         NA
3  2015                     AGRICULTURE        0         NA
4  2014                     AGRICULTURE  2370747         NA
5  2013                     AGRICULTURE  4066693         NA
6  2017                   COMMUNICATION        0         NA
7  2016                   COMMUNICATION  1680074         NA
8  2015                   COMMUNICATION  1322470         NA
9  2014                   COMMUNICATION  1460133         NA
10 2013                   COMMUNICATION  1529863         NA
11 2017                    CONSTRUCTION        0         NA
12 2016                    CONSTRUCTION        0         NA
13 2015                    CONSTRUCTION        0         NA
14 2014                    CONSTRUCTION  8250149         NA
15 2013                    CONSTRUCTION        0         NA
16 2017                       EDUCATION        0         NA
17 2016                       EDUCATION 12497015         NA
18 2015                       EDUCATION 13437356         NA
19 2014                       EDUCATION 10856685         NA
20 2013                       EDUCATION 13881127         NA
21 2017 FINANCE, INSURANCE, REAL ESTATE        0         NA
22 2016 FINANCE, INSURANCE, REAL ESTATE        0         NA
23 2015 FINANCE, INSURANCE, REAL ESTATE        0         NA
24 2014 FINANCE, INSURANCE, REAL ESTATE        0         NA
25 2013 FINANCE, INSURANCE, REAL ESTATE  5008436         NA
26 2017                      HEALTHCARE        0         NA
27 2016                      HEALTHCARE        0         NA
28 2015                      HEALTHCARE        0         NA
29 2014                      HEALTHCARE  4554364         NA
30 2013                      HEALTHCARE  5078130         NA
31 2017                     HOSPITALITY        0         NA
32 2016                     HOSPITALITY  4445512         NA
33 2015                     HOSPITALITY  5499419         NA
34 2014                     HOSPITALITY  9060639         NA
35 2013                     HOSPITALITY  4391522         NA
36 2017                   MANUFACTURING        0         NA
37 2016                   MANUFACTURING        0         NA
38 2015                   MANUFACTURING        0         NA
39 2014                   MANUFACTURING        0         NA
40 2013                   MANUFACTURING 27466974         NA
41 2017                          MINING        0         NA
42 2016                          MINING  4359251         NA
43 2015                          MINING  4163201         NA
44 2014                          MINING  6272530         NA
45 2013                          MINING  6668191         NA
46 2017                           OTHER        0         NA
47 2016                           OTHER        0         NA
48 2015                           OTHER        0         NA
49 2014                           OTHER  5935199         NA
50 2013                           OTHER  3585969         NA
51 2017                    PUBLIC ADMIN        0         NA
52 2016                    PUBLIC ADMIN        0         NA
53 2015                    PUBLIC ADMIN        0         NA
54 2014                    PUBLIC ADMIN        0         NA
55 2013                    PUBLIC ADMIN        0         NA
56 2017                    RETAIL TRADE        0         NA
57 2016                    RETAIL TRADE        0         NA
58 2015                    RETAIL TRADE        0         NA
59 2014                    RETAIL TRADE        0         NA
60 2013                    RETAIL TRADE        0         NA
61 2017                         SERVICE        0         NA
62 2016                         SERVICE        0         NA
63 2015                         SERVICE        0         NA
64 2014                         SERVICE        0         NA
65 2013                         SERVICE 28018522         NA
66 2017                  TRANSPORTATION        0         NA
67 2016                  TRANSPORTATION        0         NA
68 2015                  TRANSPORTATION        0         NA
69 2014                  TRANSPORTATION        0         NA
70 2013                  TRANSPORTATION  8430244         NA
71 2017                         UTILITY        0         NA
72 2016                         UTILITY  3551989         NA
73 2015                         UTILITY  6535248         NA
74 2014                         UTILITY  3995486         NA
75 2013                         UTILITY  4477617         NA
76 2017                 WHOLESALE TRADE        0         NA
77 2016                 WHOLESALE TRADE  6898041         NA
78 2015                 WHOLESALE TRADE  7120828         NA
79 2014                 WHOLESALE TRADE        0         NA
80 2013                 WHOLESALE TRADE        0         NA

My Code:

df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% 
     mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))

Now, based on the answers provided here How can I calculate the percentage change within a group for multiple columns in R?, also tried doing the following:

pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)

But I am getting following error:

Error in check_dot_cols(.vars, .cols) : object 'Profit' not found

Can someone please tell me, what am I doing wrong? Thanks a lot in advance.

like image 828
Krishnang K Dalal Avatar asked Jan 10 '18 21:01

Krishnang K Dalal


2 Answers

The problem lies in the fact each group has one observation. One unique year per Vertical. What is the lag of one observation? Additionally since the years go in descending order I trust you need lead.

library(tidyverse)
z %>%
  group_by(VERTICAL) %>% 
  mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2017 AGRICULTURE         0    -100   
 2  2016 AGRICULTURE   2053358     Inf   
 3  2015 AGRICULTURE         0    -100   
 4  2014 AGRICULTURE   2370747    - 41.7 
 5  2013 AGRICULTURE   4066693      NA   
 6  2017 COMMUNICATION       0    -100   
 7  2016 COMMUNICATION 1680074      27.0 
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2014 COMMUNICATION 1460133    -  4.56
10  2013 COMMUNICATION 1529863      NA   

This solution assumes the years are arranged in the correct order, to make sure:

z %>%
  group_by(VERTICAL) %>% 
  arrange(YEAR, .by_group = TRUE) %>%
  mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2013 AGRICULTURE   4066693      NA   
 2  2014 AGRICULTURE   2370747    - 41.7 
 3  2015 AGRICULTURE         0    -100   
 4  2016 AGRICULTURE   2053358     Inf   
 5  2017 AGRICULTURE         0    -100   
 6  2013 COMMUNICATION 1529863      NA   
 7  2014 COMMUNICATION 1460133    -  4.56
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2016 COMMUNICATION 1680074      27.0 
10  2017 COMMUNICATION       0    -100   

or use

arrange(desc(YEAR), .by_group = TRUE)

and lead

z is:

structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION", 
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING", 
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"), 
    Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L, 
    1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L, 
    12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L, 
    5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L, 
    0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L, 
    6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L, 
    28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L, 
    3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")
like image 143
missuse Avatar answered Oct 25 '22 23:10

missuse


Assuming that your Profit column represents the profit in a given year, this function will calculate the difference between year n and year n-1, divide by the value of year n-1, and multiply by 100 to get a percentage. If the value in year n-1 was zero, there is no valid percent change. It is important that you group the data only by VERTICAL and not by YEAR as well.

profit_pct_change <- function(x) {
  x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
  pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
  data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}

df_vertical_growth %>% 
  group_by(VERTICAL) %>%
  do(profit_pct_change(.))
like image 30
qdread Avatar answered Oct 25 '22 23:10

qdread