I need to loop over factor levels in an R data.frame. Inside the loop I need to do operations for data.frames that include subsets defined by pairs of these levels. The pairs are two consecutive unique levels of that factor.
Here is an example of what I've tried:
require(dplyr)
df <- data.frame(fac = rep(c("A", "B", "C"), 3))
for(i in levels(fac)){
if(i != levels(fac)[length(levels(fac))]){
df %>% filter(fac %in% c(i, i + 1))
}
}
I try to have level i
and its subsequent level included but obviously expression i + 1
won't do the trick. How to get around this? Do I have to make variable fac
numerical or is there a neater solution available?
EDIT: The output (for this example) should be these two data.frames:
dfAB <- df %>% filter(fac %in% c("A", "B"))
dfBC <- df %>% filter(fac %in% c("B", "C"))
Similarly, levels of a factor can be checked using the levels () function. How to create a factor in R? We can create a factor using the function factor (). Levels of a factor are inferred from the data if not provided.
For example: a data field such as marital status may contain only values from single, married, separated, divorced, or widowed. In such case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R. > x single married married single Levels: married single
The head is followed by a code block (i.e. the body of our loop). In this block we can execute basically any R syntax we want. Afterwards, the for-loop checks whether it reached the last object of the collection specified in the head of the loop.
It is very important to understand that for-loops in R do not iterate over regular sequences, but over a collection of objects. For that reason, we are able to loop through vectors of character strings.
The problem is, that you loop over all levels of fac, which is a character vector and thus R
can not add 1 to i
.
The following works:
library(dplyr)
df <- data.frame(fac = rep(c("A", "B", "C"), 3))
df <- df %>%
mutate(fac = factor(fac, levels = c("A", "B", "C")))
for(i in seq_along(levels(df$fac))){
if(i != length(levels(df$fac))){
df %>% filter(fac %in% c(levels(fac)[i], levels(fac)[i+1])) %>% print()
}
}
# fac
# 1 A
# 2 B
# 3 A
# 4 B
# 5 A
# 6 B
# fac
# 1 B
# 2 C
# 3 B
# 4 C
# 5 B
# 6 C
The fac
column has to be a factor
(otherwise the filtering doesnh't work).
I added the print()
inside the loop to print the result, but you probably want to store it somewhere (e.g. in a list).
A solution without loop.
library(dplyr)
# Create example data frame
df <- data.frame(fac = rep(c("A", "B", "C"), 3),
stringsAsFactors = TRUE)
# Create all the combinations of factor
m <- combn(unique(df$fac), m = 2)
# Check the difference between factor level, only keep those differ by 1
# Create a data frame with the right combination
re <- which(as.numeric(m[2, ]) - as.numeric(m[1, ]) != 1)
m2 <- as.data.frame.matrix(m[, -re])
# Filter df by m2
df_final <- lapply(m2, function(col){
df %>% filter(fac %in% col)
})
df_final
# $V1
# fac
# 1 A
# 2 B
# 3 A
# 4 B
# 5 A
# 6 B
#
# $V2
# fac
# 1 B
# 2 C
# 3 B
# 4 C
# 5 B
# 6 C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With