I want to calculate the percentage of Profit
by YEAR
which is a fairly simple task but somehow I am getting NA
. I have checked same questions asked before but I'm not able to understand why I am getting NA
. The data is as follows:
> df_vertical_growth
YEAR VERTICAL Profit pct_change
1 2017 AGRICULTURE 0 NA
2 2016 AGRICULTURE 2053358 NA
3 2015 AGRICULTURE 0 NA
4 2014 AGRICULTURE 2370747 NA
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 NA
7 2016 COMMUNICATION 1680074 NA
8 2015 COMMUNICATION 1322470 NA
9 2014 COMMUNICATION 1460133 NA
10 2013 COMMUNICATION 1529863 NA
11 2017 CONSTRUCTION 0 NA
12 2016 CONSTRUCTION 0 NA
13 2015 CONSTRUCTION 0 NA
14 2014 CONSTRUCTION 8250149 NA
15 2013 CONSTRUCTION 0 NA
16 2017 EDUCATION 0 NA
17 2016 EDUCATION 12497015 NA
18 2015 EDUCATION 13437356 NA
19 2014 EDUCATION 10856685 NA
20 2013 EDUCATION 13881127 NA
21 2017 FINANCE, INSURANCE, REAL ESTATE 0 NA
22 2016 FINANCE, INSURANCE, REAL ESTATE 0 NA
23 2015 FINANCE, INSURANCE, REAL ESTATE 0 NA
24 2014 FINANCE, INSURANCE, REAL ESTATE 0 NA
25 2013 FINANCE, INSURANCE, REAL ESTATE 5008436 NA
26 2017 HEALTHCARE 0 NA
27 2016 HEALTHCARE 0 NA
28 2015 HEALTHCARE 0 NA
29 2014 HEALTHCARE 4554364 NA
30 2013 HEALTHCARE 5078130 NA
31 2017 HOSPITALITY 0 NA
32 2016 HOSPITALITY 4445512 NA
33 2015 HOSPITALITY 5499419 NA
34 2014 HOSPITALITY 9060639 NA
35 2013 HOSPITALITY 4391522 NA
36 2017 MANUFACTURING 0 NA
37 2016 MANUFACTURING 0 NA
38 2015 MANUFACTURING 0 NA
39 2014 MANUFACTURING 0 NA
40 2013 MANUFACTURING 27466974 NA
41 2017 MINING 0 NA
42 2016 MINING 4359251 NA
43 2015 MINING 4163201 NA
44 2014 MINING 6272530 NA
45 2013 MINING 6668191 NA
46 2017 OTHER 0 NA
47 2016 OTHER 0 NA
48 2015 OTHER 0 NA
49 2014 OTHER 5935199 NA
50 2013 OTHER 3585969 NA
51 2017 PUBLIC ADMIN 0 NA
52 2016 PUBLIC ADMIN 0 NA
53 2015 PUBLIC ADMIN 0 NA
54 2014 PUBLIC ADMIN 0 NA
55 2013 PUBLIC ADMIN 0 NA
56 2017 RETAIL TRADE 0 NA
57 2016 RETAIL TRADE 0 NA
58 2015 RETAIL TRADE 0 NA
59 2014 RETAIL TRADE 0 NA
60 2013 RETAIL TRADE 0 NA
61 2017 SERVICE 0 NA
62 2016 SERVICE 0 NA
63 2015 SERVICE 0 NA
64 2014 SERVICE 0 NA
65 2013 SERVICE 28018522 NA
66 2017 TRANSPORTATION 0 NA
67 2016 TRANSPORTATION 0 NA
68 2015 TRANSPORTATION 0 NA
69 2014 TRANSPORTATION 0 NA
70 2013 TRANSPORTATION 8430244 NA
71 2017 UTILITY 0 NA
72 2016 UTILITY 3551989 NA
73 2015 UTILITY 6535248 NA
74 2014 UTILITY 3995486 NA
75 2013 UTILITY 4477617 NA
76 2017 WHOLESALE TRADE 0 NA
77 2016 WHOLESALE TRADE 6898041 NA
78 2015 WHOLESALE TRADE 7120828 NA
79 2014 WHOLESALE TRADE 0 NA
80 2013 WHOLESALE TRADE 0 NA
My Code:
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>%
mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))
Now, based on the answers provided here How can I calculate the percentage change within a group for multiple columns in R?, also tried doing the following:
pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)
But I am getting following error:
Error in check_dot_cols(.vars, .cols) : object 'Profit' not found
Can someone please tell me, what am I doing wrong? Thanks a lot in advance.
The problem lies in the fact each group has one observation. One unique year per Vertical. What is the lag of one observation? Additionally since the years go in descending order I trust you need lead.
library(tidyverse)
z %>%
group_by(VERTICAL) %>%
mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2017 AGRICULTURE 0 -100
2 2016 AGRICULTURE 2053358 Inf
3 2015 AGRICULTURE 0 -100
4 2014 AGRICULTURE 2370747 - 41.7
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 -100
7 2016 COMMUNICATION 1680074 27.0
8 2015 COMMUNICATION 1322470 - 9.43
9 2014 COMMUNICATION 1460133 - 4.56
10 2013 COMMUNICATION 1529863 NA
This solution assumes the years are arranged in the correct order, to make sure:
z %>%
group_by(VERTICAL) %>%
arrange(YEAR, .by_group = TRUE) %>%
mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2013 AGRICULTURE 4066693 NA
2 2014 AGRICULTURE 2370747 - 41.7
3 2015 AGRICULTURE 0 -100
4 2016 AGRICULTURE 2053358 Inf
5 2017 AGRICULTURE 0 -100
6 2013 COMMUNICATION 1529863 NA
7 2014 COMMUNICATION 1460133 - 4.56
8 2015 COMMUNICATION 1322470 - 9.43
9 2016 COMMUNICATION 1680074 27.0
10 2017 COMMUNICATION 0 -100
or use
arrange(desc(YEAR), .by_group = TRUE)
and lead
z is:
structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L,
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L,
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L,
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L,
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION",
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING",
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"),
Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L,
1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L,
12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L,
5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L,
0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L,
6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L,
28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L,
3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36",
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47",
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")
Assuming that your Profit
column represents the profit in a given year, this function will calculate the difference between year n and year n-1, divide by the value of year n-1, and multiply by 100 to get a percentage. If the value in year n-1 was zero, there is no valid percent change. It is important that you group the data only by VERTICAL
and not by YEAR
as well.
profit_pct_change <- function(x) {
x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}
df_vertical_growth %>%
group_by(VERTICAL) %>%
do(profit_pct_change(.))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With