More efficient ways to use R than 'for' loops

Tags:

I'm a relative newcomer to R so I'm sorry if there's an obvious answer to this. I've looked at other questions and I think 'apply' is the answer but I can't work out how to use it in this case.

I've got a longitudinal survey where participants are invited every year. In some years they fail to take part, and sometimes they die. I need to identify which participants have taken part for a consistent 'streak' since from the start of the survey (i.e. if they stop, they stop for good).

I've done this with a 'for' loop, which works fine in the example below. But I have many years and many participants, and the loop is very slow. Is there a faster approach I could use?

In the example, TRUE means they participated in that year. The loop creates two vectors - 'finalyear' for the last year they took part, and 'streak' to show if they completed all years before the finalyear (i.e. cases 1, 3 and 5).

dat <- data.frame(ids = 1:5, "1999" = c(T, T, T, F, T), "2000" = c(T, F, T, F, T), "2001" = c(T, T, T, T, T), "2002" = c(F, T, T, T, T), "2003" = c(F, T, T, T, F))
finalyear <- NULL
streak <- NULL
for (i in 1:nrow(dat)) {
    x <- as.numeric(dat[i,2:6])
    y <- max(grep(1, x))
    finalyear[i] <- y
    streak[i] <- sum(x) == y
}
dat$finalyear <- finalyear
dat$streak <- streak

Thanks!

859

asked Sep 04 '15 11:09

Dan Lewer

2 Answers

We could use max.col and rowSums as a vectorized approach.

dat$finalyear <- max.col(dat[-1], 'last')

If there are rows without TRUE values, we can make sure to return 0 for that row by multiplying with the double negation of rowSums. The FALSE will be coerced to 0 and multiplying with 0 returns 0 for that row.

dat$finalyear <- max.col(dat[-1], 'last')*!!rowSums(dat[-1])

Then, we create the 'streak' column by comparing the rowSums of columns 2:6 with that of 'finalyear'

dat$streak <-  rowSums(dat[,2:6])==dat$finalyear
dat
#   ids X1999 X2000 X2001 X2002 X2003 finalyear streak
#1   1  TRUE  TRUE  TRUE FALSE FALSE         3   TRUE
#2   2  TRUE FALSE  TRUE  TRUE  TRUE         5  FALSE
#3   3  TRUE  TRUE  TRUE  TRUE  TRUE         5   TRUE
#4   4 FALSE FALSE  TRUE  TRUE  TRUE         5  FALSE
#5   5  TRUE  TRUE  TRUE  TRUE FALSE         4   TRUE

Or a one-line code (it could fit in one-line, but decided to make it obvious by 2-lines ) suggested by @ColonelBeauvel

library(dplyr)
mutate(dat, finalyear=max.col(dat[-1], 'last'), 
            streak=rowSums(dat[-1])==finalyear)

answered Sep 18 '22 23:09

akrun

For-loops are not inherently bad in R, but they are slow if you grow vectors iteratively (like you are doing). There are often better ways to do things. Example of a solution with only apply-functions:

dat$finalyear <- apply(dat[,2:6],MARGIN=1,function(x){max(which(x))})
dat$streak <-  apply(dat[,2:7],MARGIN=1,function(x){sum(x[1:5])==x[6]})

Or option 2, based on comment by @Spacedman:

dat$finalyear <- apply(dat[,2:6],MARGIN=1,function(x){max(which(x))})
dat$streak <-  apply(dat[,2:6],MARGIN=1,function(x){max(which(x))==sum(x)})

> dat
  ids X1999 X2000 X2001 X2002 X2003 finalyear streak
1   1  TRUE  TRUE  TRUE FALSE FALSE         3   TRUE
2   2  TRUE FALSE  TRUE  TRUE  TRUE         5  FALSE
3   3  TRUE  TRUE  TRUE  TRUE  TRUE         5   TRUE
4   4 FALSE FALSE  TRUE  TRUE  TRUE         5  FALSE
5   5  TRUE  TRUE  TRUE  TRUE FALSE         4   TRUE

answered Sep 22 '22 23:09

Heroka

Related questions
                            
                                Splitting vector based on vector of chunk-lengths
                            
                                Reorganizing a unique (NYC MTA turnstile) dataset in R
                            
                                Error in R (mice package), too many weights
                            
                                How to source R code without overwriting current variables?
                            
                                How to speed up or vectorize a for loop?
                            
                                R: Convert list with different number of rows to data.frame
                            
                                How to convert vector of characters to corpus input for the DocumentTermMatrix function from tm package in R?
                            
                                ggplot2: More complex faceting
                            
                                Multiple duplicates (2 times, 3 times,...) in R
                            
                                apply multiple functions in sapply
                            
                                Changing values on one dataframe based on data in another dataframe
                            
                                Leaflet map legend in R Shiny app has doesn't show colors
                            
                                How do I put multiple boxplots in the same graph in R?
                            
                                How to use OpenNLP to get POS tags in R?
                            
                                Why do the results of mad(x) differ from the expected results?
                            
                                R: reading in .csv file removes leading zeros
                            
                                Convert from ANSI to UTF-8
                            
                                Fill 'NA's in data frame with information contained in one of the rows with a patient's ID using R
                            
                                How to write an R function or loop that will print every third number or nth number in [1, 100]?
                            
                                CRAN/ Bioconductor package installs fail: Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

More efficient ways to use R than 'for' loops

Tags:

for-loop

r

apply

survey

Dan Lewer

People also ask

2 Answers

akrun

Heroka

Recent Activity

Donate For Us