Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make the levels of a factor in a data frame consistent across all columns?

I have a data frame with 5 different columns:

         Test1   Test2   Test3  Test4  Test5 
Sample1  PASS    PASS    FAIL    WARN   WARN
Sample2  PASS    PASS    FAIL    PASS   WARN
Sample3  PASS    FAIL    FAIL    PASS   WARN
Sample4  PASS    FAIL    FAIL    PASS   WARN
Sample5  PASS    WARN    FAIL    WARN   WARN

In each column, each level is assigned a different factor. In column 1, "PASS" is 1. In column 2, "PASS" is 2 and "FAIL is 1. In column 3, "FAIL" is 1. In column 4, "PASS" is 1 and "WARN" is 2. In column 5, "WARN" IS 1.

It is doing it by alphabetical order I need "PASS" be 1 in all columns, "WARN" to be 2 in all columns, and "FAIL" 3 in all columns, so that I can then convert into a matrix and turn it into a heatmap.

Currently it is assigning the factors to the levels depending on which ones show up in a specific column, and by alphabetical order.

How can I keep it constant throughout the entire data frame?

like image 961
gaelgarcia Avatar asked Jan 30 '15 04:01

gaelgarcia


2 Answers

You could change the levels of the dataset "df" to be in the same order by looping (lapply) and convert to factor again with the specified levels and assign it back to the corresponding columns.

lvls <- c('PASS', 'WARN', 'FAIL')
df[] <-  lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

If you opt to use data.table

library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]

setDT converts to "data.frame" to "data.table", assign (:=) the column names of the dataset to the reconverted factor columns (lapply(..)). .SD denotes "Subset of Datatable".

data

df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "PASS", class = "factor"), 
  Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL", 
 "PASS", "WARN"), class = "factor"), Test3 = structure(c(1L, 
 1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 = 
 structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"), 
 class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = 
"WARN", class = "factor")), .Names = c("Test1", 
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")
like image 53
akrun Avatar answered Oct 26 '22 15:10

akrun


Using dplyr:

library(dplyr)
df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))

You get:

#> str(df)
#'data.frame':  5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
like image 41
Steven Beaupré Avatar answered Oct 26 '22 14:10

Steven Beaupré