Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping over factor levels in R - how to operate two consecutive levels

I need to loop over factor levels in an R data.frame. Inside the loop I need to do operations for data.frames that include subsets defined by pairs of these levels. The pairs are two consecutive unique levels of that factor.

Here is an example of what I've tried:

require(dplyr)

df <- data.frame(fac = rep(c("A", "B", "C"), 3))

for(i in levels(fac)){

   if(i != levels(fac)[length(levels(fac))]){
      df %>% filter(fac %in% c(i, i + 1))
   }
}

I try to have level i and its subsequent level included but obviously expression i + 1 won't do the trick. How to get around this? Do I have to make variable fac numerical or is there a neater solution available?

EDIT: The output (for this example) should be these two data.frames:

dfAB <- df %>% filter(fac %in% c("A", "B"))
dfBC <- df %>% filter(fac %in% c("B", "C"))
like image 669
Antti Avatar asked Jan 26 '18 10:01

Antti


People also ask

How to check the levels of a factor in R?

Similarly, levels of a factor can be checked using the levels () function. How to create a factor in R? We can create a factor using the function factor (). Levels of a factor are inferred from the data if not provided.

What are levels in R with example?

For example: a data field such as marital status may contain only values from single, married, separated, divorced, or widowed. In such case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R. > x single married married single Levels: married single

What happens at the end of a for loop in R?

The head is followed by a code block (i.e. the body of our loop). In this block we can execute basically any R syntax we want. Afterwards, the for-loop checks whether it reached the last object of the collection specified in the head of the loop.

How do for-loops work in R?

It is very important to understand that for-loops in R do not iterate over regular sequences, but over a collection of objects. For that reason, we are able to loop through vectors of character strings.


2 Answers

The problem is, that you loop over all levels of fac, which is a character vector and thus R can not add 1 to i.

The following works:

library(dplyr)

df <- data.frame(fac = rep(c("A", "B", "C"), 3))

df <- df %>% 
  mutate(fac = factor(fac, levels = c("A", "B", "C")))

for(i in seq_along(levels(df$fac))){
  if(i != length(levels(df$fac))){
    df %>% filter(fac %in% c(levels(fac)[i], levels(fac)[i+1])) %>% print()
  }
}

#   fac
# 1   A
# 2   B
# 3   A
# 4   B
# 5   A
# 6   B
#   fac
# 1   B
# 2   C
# 3   B
# 4   C
# 5   B
# 6   C

The fac column has to be a factor (otherwise the filtering doesnh't work). I added the print() inside the loop to print the result, but you probably want to store it somewhere (e.g. in a list).

like image 129
kath Avatar answered Sep 28 '22 19:09

kath


A solution without loop.

library(dplyr)

# Create example data frame
df <- data.frame(fac = rep(c("A", "B", "C"), 3),
                       stringsAsFactors = TRUE)

# Create all the combinations of factor
m <- combn(unique(df$fac), m = 2)

# Check the difference between factor level, only keep those differ by 1
# Create a data frame with the right combination
re <- which(as.numeric(m[2, ]) - as.numeric(m[1, ]) != 1)
m2 <- as.data.frame.matrix(m[, -re])

# Filter df by m2
df_final <- lapply(m2, function(col){
  df %>% filter(fac %in% col)
})

df_final
# $V1
#   fac
# 1   A
# 2   B
# 3   A
# 4   B
# 5   A
# 6   B
# 
# $V2
#   fac
# 1   B
# 2   C
# 3   B
# 4   C
# 5   B
# 6   C
like image 21
www Avatar answered Sep 28 '22 18:09

www