Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset all levels of a single factor

Tags:

r

Is there a way to subset all levels of a single factor in one clean swoop?

Case: Assuming you have a data frame where one of the columns is a factor (data$factor) and you want to create subset data frames that contain only one level of the factor. This is simple to do when there are a small number of factors by writing separate subset commands. However, what if you have a large number of levels (e.g. 50+ levels)? Is there a command or a clever way to create all the subsets in such a case without having to write 50+ subset commands?

like image 739
whistler Avatar asked Jul 28 '13 05:07

whistler


People also ask

Can you subset factor in R?

In data analysis, we often deal with factor variables and these factor variables have different levels. Sometimes, we want to create subset of the data frame in R for specific factor levels to analyze the data only for that particular level of the factor variable. This can be simply done by using subset function.

What are levels of a factor?

The number of levels of a factor or independent variable is equal to the number of variations of that factor that were used in the experiment. If an experiment compared the drug dosages 50 mg, 100 mg, and 150 mg, then the factor "drug dosage" would have three levels: 50 mg, 100 mg, and 150 mg.

How do I extract a level in R?

To extract the factor levels from factor column, we can simply use levels function. For example, if we have a data frame called df that contains a factor column defined with x then the levels of factor levels in x can be extracted by using the command levels(df$x).


1 Answers

Without having to create a loop, the SPLIT function is key to solving this problem.

Assuming the factor column you want to subset (or subgroup) is in the column "factor" of the data frame "data" do:

subsets<-split(data, data$factor, drop=TRUE)

This will create a list of subsets based on the factor value. The list will have the same length as the number of factors.

If you need to put each subset in a separate data frame, you can access them by doing the following:

group1<-subsets[[1]]
group2<-subsets[[2]]
...
like image 114
whistler Avatar answered Oct 31 '22 05:10

whistler