I'm trying to calculate some percentage change concisely using data.table, but I've got some trouble understanding entirely how the .SD operations works.
Let's say I have the following table
dt = structure(list(type = c("A", "A", "A", "B", "B", "B"), Year = c(2000L,
2005L, 2010L, 2000L, 2005L, 2010L), alpha = c(0.0364325563237498,
0.0401968159729988, 0.0357395587861466, 0.0317236054181487, 0.0328213742235379,
0.0294694430578336), beta = c(0.0364325563237498, 0.0401968159729988,
0.0357395587861466, 0.0317236054181487, 0.0328213742235379, 0.0294694430578336
)), .Names = c("type", "Year", "alpha", "beta"), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
> dt
## type Year alpha beta
## 1: A 2000 0.03643256 0.03643256
## 2: A 2005 0.04019682 0.04019682
## 3: A 2010 0.03573956 0.03573956
## 4: B 2000 0.03172361 0.03172361
## 5: B 2005 0.03282137 0.03282137
## 6: B 2010 0.02946944 0.02946944
To calculate the percentage change on alpha, by category, I came up with the following code:
dt[,change:=list(lapply(3:2,function(x)(.SD[x,alpha]/.SD[
(x-1),alpha]))),by=list(type)][][Year==2000,change:=NA]
But something tells me their could be a more concise way of doing it. In particular if one would want to perform the percentage change for both columns the following would not work
dt[,c("changeAlpha","changeBeta"):=list(lapply(3:2,
function(x)(.SD[x]/.SD[(x-1)]))),by=list(type)][Year==2000,change:=NA][]
So I resorted to:
dt[,c("changeAlpha","changeBeta"):=list(
lapply(3:2,function(x)(.SD[x,alpha]/.SD[(x-1),alpha])),
lapply(3:2,function(x)(.SD[x,beta]/.SD[(x-1),beta]))),by=list(type)][
Year==2000,c("changeAlpha","changeBeta"):=list(NA,NA)][]
## type Year alpha beta changeAlpha changeBeta
## 1: A 2000 0.03643256 0.03643256 NA NA
## 2: A 2005 0.04019682 0.04019682 1.10332131557826 1.10332131557826
## 3: A 2010 0.03573956 0.03573956 0.889114172877617 0.889114172877617
## 4: B 2000 0.03172361 0.03172361 NA NA
## 5: B 2005 0.03282137 0.03282137 1.03460416276522 1.03460416276522
## 6: B 2010 0.02946944 0.02946944 0.897873527693412 0.897873527693412
But the operations seems right but got a lot of warnings which lead me here.
To calculate a percentage decrease, first work out the difference (decrease) between the two numbers you are comparing. Next, divide the decrease by the original number and multiply the answer by 100. The result expresses the change as a percentage—i.e., the percentage change.
Knowledge of percentage change is also important for cracking data interpretation questions. The formula that we use for percentage change from P to Q is: 100 * (Q - P)/P.
You could use the shift
function from data.table v1.9.6+
Define your function
myFunc <- function(x) x/shift(x)
Select the columns you want to calculate the percentage for
cols <- c("alpha", "beta")
Or if you want to run this on all the colunms except the first two
cols <- names(dt)[-(1:2)]
Run the function over the columns
dt[, paste0("change", cols) := lapply(.SD, myFunc), by = type, .SDcols = cols][]
# type Year alpha beta changealpha changebeta
# 1: A 2000 0.03643256 0.03643256 NA NA
# 2: A 2005 0.04019682 0.04019682 1.1033213 1.1033213
# 3: A 2010 0.03573956 0.03573956 0.8891142 0.8891142
# 4: B 2000 0.03172361 0.03172361 NA NA
# 5: B 2005 0.03282137 0.03282137 1.0346042 1.0346042
# 6: B 2010 0.02946944 0.02946944 0.8978735 0.8978735
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With