Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I set factor levels to the order they appear in a data frame?

Tags:

r

I want to create a heat map using ggplot however I want to order the y-axis by the number of observations. I order the dataframe by the column N and add the number of observations to the group name so that it appears in the axis label. When I plot the data it re-orders based on the group name. Is there a way to set factor levels based on the order they appear in the data frame?

Some data:

library(dplyr)
library(tidyr)
library(ggplot2)

school <- c("School A", "SChool B", "School C", "School D", "School E", "School F")
N <- c(25,28,12,22,30,25)
var1 <- c(1,0,1,1,0,1)
var2 <- c(0,0,0,1,0,1)
var3 <- c(0,1,0,1,1,1)

df <- tbl_df (data.frame (school, N, var1, var2, var3))

df <- arrange (df, N) %>%
  gather (variable, value, var1:var3)

df$school <- paste0 (df$school, " (", df$N, ")")

df <- select (df, school, variable, value)

ggplot(df, aes(variable, school)) + geom_tile(aes(fill = value), colour = "white") + 
  scale_fill_gradient(low = "white",high = "steelblue")

Ultimately I want the order of schools to be:

School C (12)

School D (22)

School A (25)

School F (25)

School B (28)

School E (30)

As I want to do this for multiple plots I want to find a way to do this automatically and not have to re-set factor levels each time.

like image 266
GregRousell Avatar asked Oct 20 '14 15:10

GregRousell


People also ask

How do you reorder a level of factor?

One way to change the level order is to use factor() on the factor and specify the order directly. In this example, the function ordered() could be used instead of factor() . Another way to change the order is to use relevel() to make a particular level first in the list.

How do I order a factor column in R?

To sort a numerical factor column in an R data frame, we would need to column with as. character then as. numeric function and then order function will be used.

How do you change factor levels in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

How do you sort a factor in a Dataframe in R?

R – Level Ordering of Factors They represent columns as they have a limited number of unique values. Factors in R can be created using factor() function. It takes a vector as input. c() function is used to create a vector with explicitly provided values.


2 Answers

One way around this is to change your ggplot call to

ggplot(df, aes(variable, factor(school, levels = unique(school)))) + ...

To avoid typing this every time, you can create a function

f <- function(x) factor(x, levels = unique(x))

and then call it by ggplot(df, aes(variable, f(school))) + ...

Note that this will place the first level of the factor at the bottom of the plot. If you want it at the top, you need to change f to function(x) factor(x, levels = rev(unique(x)))

like image 89
konvas Avatar answered Sep 24 '22 08:09

konvas


Add the following forcats pipe to the code just before the call to ggplot().

library(forcats)
df$school <- fct_inorder(df$school) %>% fct_rev()

fct_inorder() creates factor levels in data frame order and fct_rev() reverses them so the plot goes in the right direction.

like image 23
Joe Avatar answered Sep 23 '22 08:09

Joe