Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple sum if expression

These are my sample data:

dt <- data.table(id=c("a","a","a","a","b","b"), monthsinarrears=c(0,1,0,0,1,0), date=c(2013,2014,2015,2016,2014,2015))

The table looks like this:

> dt
   id monthsinarrears date
1:  a               0 2013
2:  a               1 2014
3:  a               0 2015
4:  a               0 2016
5:  b               1 2014
6:  b               0 2015

Now I want to create an additional column called "EverinArrears" which will be assigned with "1" if the id was ever in arrears (historically) and "0" if it wasn't. Thus the output I want to obtain is:

   id monthsinarrears date EverinArrears
1:  a               0 2013             0
2:  a               1 2014             1
3:  a               0 2015             1
4:  a               0 2016             1
5:  b               1 2014             1
6:  b               0 2015             1

Note that loan id a was not historically in Arrears in 2013 (this happened in 2014) so that's why EverinArrears gets a 0 zero as well in 2013.

like image 214
Dave van Brecht Avatar asked Oct 05 '15 08:10

Dave van Brecht


2 Answers

You can do the following (thanks to @Roland for the hint to avoid numbers > 1) :

dt[, EverinArrears := as.integer(as.logical(cumsum(monthsinarrears))), by=id]

Output:

#   id monthsinarrears date EA
#1:  a               0 2013  0
#2:  a               1 2014  1
#3:  a               0 2015  1
#4:  a               0 2016  1
#5:  b               1 2014  1
#6:  b               0 2015  1

Note: if you prefer a shorter code, you can also do

dt[, EverinArrears := +(!!(cumsum(monthsinarrears))), by=id]

although is not as "good practice" as as.integer(as.logical(...))

As mentioned by @Jaap, you can also do:

dt[, EverinArrears := +(cumsum(monthsinarrears) > 0), by = id]

or, for better practice:

dt[, EverinArrears := as.integer(cumsum(monthsinarrears) > 0), by = id]

As suggested by @Arun in the comment, another, simpler, way:

dt[, EverinArrears := cummax(monthsinarrears), by = id]
like image 132
Cath Avatar answered Oct 22 '22 05:10

Cath


Here's a slight variation on the others' answers:

dt[, newcol := cummax(monthsinarrears > 0), by=id]

By using cummax instead of cumsum, we might save on some computations.


And here's a way of comparing against the position of the first entry with positive months in arrears:

dt[, newcol := {
  z = which(monthsinarrears > 0)
  if (!length(z)) rep(0L,.N)
  else            replace(rep(1L,.N), 1:.N < z[1], 0L)
}, by=id]

Not sure if that might be any more efficient; it certainly depends on the data to some extent.

like image 44
Frank Avatar answered Oct 22 '22 05:10

Frank