Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table operation with .SD: calculating percentage change concisely

Tags:

r

data.table

I'm trying to calculate some percentage change concisely using data.table, but I've got some trouble understanding entirely how the .SD operations works.

Let's say I have the following table

dt = structure(list(type = c("A", "A", "A", "B", "B", "B"), Year = c(2000L, 
2005L, 2010L, 2000L, 2005L, 2010L), alpha = c(0.0364325563237498, 
0.0401968159729988, 0.0357395587861466, 0.0317236054181487, 0.0328213742235379, 
0.0294694430578336), beta = c(0.0364325563237498, 0.0401968159729988, 
0.0357395587861466, 0.0317236054181487, 0.0328213742235379, 0.0294694430578336
)), .Names = c("type", "Year", "alpha", "beta"), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))


> dt
##    type Year      alpha       beta
## 1:    A 2000 0.03643256 0.03643256
## 2:    A 2005 0.04019682 0.04019682
## 3:    A 2010 0.03573956 0.03573956
## 4:    B 2000 0.03172361 0.03172361
## 5:    B 2005 0.03282137 0.03282137
## 6:    B 2010 0.02946944 0.02946944

To calculate the percentage change on alpha, by category, I came up with the following code:

dt[,change:=list(lapply(3:2,function(x)(.SD[x,alpha]/.SD[
(x-1),alpha]))),by=list(type)][][Year==2000,change:=NA]   

But something tells me their could be a more concise way of doing it. In particular if one would want to perform the percentage change for both columns the following would not work

dt[,c("changeAlpha","changeBeta"):=list(lapply(3:2,
function(x)(.SD[x]/.SD[(x-1)]))),by=list(type)][Year==2000,change:=NA][]

So I resorted to:

dt[,c("changeAlpha","changeBeta"):=list(
lapply(3:2,function(x)(.SD[x,alpha]/.SD[(x-1),alpha])),
lapply(3:2,function(x)(.SD[x,beta]/.SD[(x-1),beta]))),by=list(type)][
Year==2000,c("changeAlpha","changeBeta"):=list(NA,NA)][]

##        type Year      alpha       beta       changeAlpha        changeBeta
## 1:    A 2000 0.03643256 0.03643256                NA                NA
## 2:    A 2005 0.04019682 0.04019682  1.10332131557826  1.10332131557826
## 3:    A 2010 0.03573956 0.03573956 0.889114172877617 0.889114172877617
## 4:    B 2000 0.03172361 0.03172361                NA                NA
## 5:    B 2005 0.03282137 0.03282137  1.03460416276522  1.03460416276522
## 6:    B 2010 0.02946944 0.02946944 0.897873527693412 0.897873527693412

But the operations seems right but got a lot of warnings which lead me here.

  • Is my method of thinking completely wrong or is it the right way to proceed for this operation?
like image 348
DJJ Avatar asked Mar 14 '15 23:03

DJJ


People also ask

What is the simple percentage change method?

To calculate a percentage decrease, first work out the difference (decrease) between the two numbers you are comparing. Next, divide the decrease by the original number and multiply the answer by 100. The result expresses the change as a percentage—i.e., the percentage change.

How do you calculate percent increase or decrease in data interpretation?

Knowledge of percentage change is also important for cracking data interpretation questions. The formula that we use for percentage change from P to Q is: 100 * (Q - P)/P.


1 Answers

You could use the shift function from data.table v1.9.6+

Define your function

myFunc <- function(x) x/shift(x)

Select the columns you want to calculate the percentage for

cols <- c("alpha", "beta")

Or if you want to run this on all the colunms except the first two

cols <- names(dt)[-(1:2)]

Run the function over the columns

dt[, paste0("change", cols) := lapply(.SD, myFunc), by = type, .SDcols = cols][]
#    type Year      alpha       beta changealpha changebeta
# 1:    A 2000 0.03643256 0.03643256          NA         NA
# 2:    A 2005 0.04019682 0.04019682   1.1033213  1.1033213
# 3:    A 2010 0.03573956 0.03573956   0.8891142  0.8891142
# 4:    B 2000 0.03172361 0.03172361          NA         NA
# 5:    B 2005 0.03282137 0.03282137   1.0346042  1.0346042
# 6:    B 2010 0.02946944 0.02946944   0.8978735  0.8978735
like image 89
David Arenburg Avatar answered Sep 26 '22 02:09

David Arenburg