Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient ways to use R than 'for' loops

I'm a relative newcomer to R so I'm sorry if there's an obvious answer to this. I've looked at other questions and I think 'apply' is the answer but I can't work out how to use it in this case.

I've got a longitudinal survey where participants are invited every year. In some years they fail to take part, and sometimes they die. I need to identify which participants have taken part for a consistent 'streak' since from the start of the survey (i.e. if they stop, they stop for good).

I've done this with a 'for' loop, which works fine in the example below. But I have many years and many participants, and the loop is very slow. Is there a faster approach I could use?

In the example, TRUE means they participated in that year. The loop creates two vectors - 'finalyear' for the last year they took part, and 'streak' to show if they completed all years before the finalyear (i.e. cases 1, 3 and 5).

dat <- data.frame(ids = 1:5, "1999" = c(T, T, T, F, T), "2000" = c(T, F, T, F, T), "2001" = c(T, T, T, T, T), "2002" = c(F, T, T, T, T), "2003" = c(F, T, T, T, F))
finalyear <- NULL
streak <- NULL
for (i in 1:nrow(dat)) {
    x <- as.numeric(dat[i,2:6])
    y <- max(grep(1, x))
    finalyear[i] <- y
    streak[i] <- sum(x) == y
}
dat$finalyear <- finalyear
dat$streak <- streak

Thanks!

like image 859
Dan Lewer Avatar asked Sep 04 '15 11:09

Dan Lewer


People also ask

What can I use instead of a for loop in R?

Instead of using a for loop, it's better to use a functional. Each functional is tailored for a specific task, so when you recognise the functional you know immediately why it's being used.

What is faster than for loop in R?

Results. The sapply() was faster than the for() loop, but how much faster depends on the values of n .

What is more efficient than for loops?

Conclusions. List comprehensions are often not only more readable but also faster than using “for loops.” They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.


2 Answers

We could use max.col and rowSums as a vectorized approach.

dat$finalyear <- max.col(dat[-1], 'last')

If there are rows without TRUE values, we can make sure to return 0 for that row by multiplying with the double negation of rowSums. The FALSE will be coerced to 0 and multiplying with 0 returns 0 for that row.

dat$finalyear <- max.col(dat[-1], 'last')*!!rowSums(dat[-1])

Then, we create the 'streak' column by comparing the rowSums of columns 2:6 with that of 'finalyear'

dat$streak <-  rowSums(dat[,2:6])==dat$finalyear
dat
#   ids X1999 X2000 X2001 X2002 X2003 finalyear streak
#1   1  TRUE  TRUE  TRUE FALSE FALSE         3   TRUE
#2   2  TRUE FALSE  TRUE  TRUE  TRUE         5  FALSE
#3   3  TRUE  TRUE  TRUE  TRUE  TRUE         5   TRUE
#4   4 FALSE FALSE  TRUE  TRUE  TRUE         5  FALSE
#5   5  TRUE  TRUE  TRUE  TRUE FALSE         4   TRUE

Or a one-line code (it could fit in one-line, but decided to make it obvious by 2-lines ) suggested by @ColonelBeauvel

library(dplyr)
mutate(dat, finalyear=max.col(dat[-1], 'last'), 
            streak=rowSums(dat[-1])==finalyear)
like image 50
akrun Avatar answered Sep 18 '22 23:09

akrun


For-loops are not inherently bad in R, but they are slow if you grow vectors iteratively (like you are doing). There are often better ways to do things. Example of a solution with only apply-functions:

dat$finalyear <- apply(dat[,2:6],MARGIN=1,function(x){max(which(x))})
dat$streak <-  apply(dat[,2:7],MARGIN=1,function(x){sum(x[1:5])==x[6]})

Or option 2, based on comment by @Spacedman:

dat$finalyear <- apply(dat[,2:6],MARGIN=1,function(x){max(which(x))})
dat$streak <-  apply(dat[,2:6],MARGIN=1,function(x){max(which(x))==sum(x)})

> dat
  ids X1999 X2000 X2001 X2002 X2003 finalyear streak
1   1  TRUE  TRUE  TRUE FALSE FALSE         3   TRUE
2   2  TRUE FALSE  TRUE  TRUE  TRUE         5  FALSE
3   3  TRUE  TRUE  TRUE  TRUE  TRUE         5   TRUE
4   4 FALSE FALSE  TRUE  TRUE  TRUE         5  FALSE
5   5  TRUE  TRUE  TRUE  TRUE FALSE         4   TRUE
like image 35
Heroka Avatar answered Sep 22 '22 23:09

Heroka