How to assign colors to categorical variables in ggplot2 that have stable mapping?

People also ask

How do I assign a color to a variable in R?

In R, colors can be specified either by name (e.g col = “red”) or as a hexadecimal RGB triplet (such as col = “#FFCC00”). You can also use other color systems such as ones taken from the RColorBrewer package.

Is color categorical data?

Categorical or nominalHair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest.

How do you add color to a geom point?

To color the points in a scatterplot using ggplot2, we can use colour argument inside geom_point with aes. The color can be passed in multiple ways, one such way is to name the particular color and the other way is to giving a range or using a variable.

Which geom should be used to plot categorical data?

Barplots. Barplots are useful for visualizing categorical data. By default, geom_bar accepts a variable for x, and plots the number of times each value of x (in this case, wall type) appears in the dataset.

For simple situations like the exact example in the OP, I agree that Thierry's answer is the best. However, I think it's useful to point out another approach that becomes easier when you're trying to maintain consistent color schemes across multiple data frames that are not all obtained by subsetting a single large data frame. Managing the factors levels in multiple data frames can become tedious if they are being pulled from separate files and not all factor levels appear in each file.

One way to address this is to create a custom manual colour scale as follows:

#Some test data
dat <- data.frame(x=runif(10),y=runif(10),
        grp = rep(LETTERS[1:5],each = 2),stringsAsFactors = TRUE)

#Create a custom color scale
library(RColorBrewer)
myColors <- brewer.pal(5,"Set1")
names(myColors) <- levels(dat$grp)
colScale <- scale_colour_manual(name = "grp",values = myColors)

and then add the color scale onto the plot as needed:

#One plot with all the data
p <- ggplot(dat,aes(x,y,colour = grp)) + geom_point()
p1 <- p + colScale

#A second plot with only four of the levels
p2 <- p %+% droplevels(subset(dat[4:10,])) + colScale

The first plot looks like this:

enter image description here

and the second plot looks like this:

enter image description here

This way you don't need to remember or check each data frame to see that they have the appropriate levels.

I am in the same situation pointed out by malcook in his comment: unfortunately the answer by Thierry does not work with ggplot2 version 0.9.3.1.

png("figure_%d.png")
set.seed(2014)
library(ggplot2)
dataset <- data.frame(category = rep(LETTERS[1:5], 100),
    x = rnorm(500, mean = rep(1:5, 100)),
    y = rnorm(500, mean = rep(1:5, 100)))
dataset$fCategory <- factor(dataset$category)
subdata <- subset(dataset, category %in% c("A", "D", "E"))

ggplot(dataset, aes(x = x, y = y, colour = fCategory)) + geom_point()
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point()

Here it is the first figure:

ggplot A-E, mixed colors

and the second figure:

ggplot ADE, mixed colors

As we can see the colors do not stay fixed, for example E switches from magenta to blu.

As suggested by malcook in his comment and by hadley in his comment the code which uses limits works properly:

ggplot(subdata, aes(x = x, y = y, colour = fCategory)) +       
    geom_point() + 
    scale_colour_discrete(drop=TRUE,
        limits = levels(dataset$fCategory))

gives the following figure, which is correct:

correct ggplot

This is the output from sessionInfo():

R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] methods   stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] ggplot2_0.9.3.1

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4       grid_3.0.2        
 [5] gtable_0.1.2       labeling_0.2       MASS_7.3-29        munsell_0.4.2     
 [9] plyr_1.8           proto_0.3-10       RColorBrewer_1.0-5 reshape2_1.2.2    
[13] scales_0.2.3       stringr_0.6.2

This is an old post, but I was looking for answer to this same question,

Why not try something like:

scale_color_manual(values = c("foo" = "#999999", "bar" = "#E69F00"))

If you have categorical values, I don't see a reason why this should not work.

Based on the very helpful answer by joran I was able to come up with this solution for a stable color scale for a boolean factor (TRUE, FALSE).

boolColors <- as.character(c("TRUE"="#5aae61", "FALSE"="#7b3294"))
boolScale <- scale_colour_manual(name="myboolean", values=boolColors)

ggplot(myDataFrame, aes(date, duration)) + 
  geom_point(aes(colour = myboolean)) +
  boolScale

Since ColorBrewer isn't very helpful with binary color scales, the two needed colors are defined manually.

Here myboolean is the name of the column in myDataFrame holding the TRUE/FALSE factor. date and duration are the column names to be mapped to the x and y axis of the plot in this example.

The easiest solution is to convert your categorical variable to a factor prior to the subsetting. Bottomline is that you need a factor variable with exact the same levels in all your subsets.

library(ggplot2)
dataset <- data.frame(category = rep(LETTERS[1:5], 100), 
    x = rnorm(500, mean = rep(1:5, 100)), y = rnorm(500, mean = rep(1:5, 100)))
dataset$fCategory <- factor(dataset$category)
subdata <- subset(dataset, category %in% c("A", "D", "E"))

With a character variable

ggplot(dataset, aes(x = x, y = y, colour = category)) + geom_point()
ggplot(subdata, aes(x = x, y = y, colour = category)) + geom_point()

With a factor variable

ggplot(dataset, aes(x = x, y = y, colour = fCategory)) + geom_point()
ggplot(subdata, aes(x = x, y = y, colour = fCategory)) + geom_point()

Related questions
                            
                                How to split data into training/testing sets using sample function
                            
                                Remove duplicated rows
                            
                                Capitalize the first letter of both words in a two word string
                            
                                Workflow for statistical analysis and report writing
                            
                                Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2
                            
                                Determine the data types of a data frame's columns
                            
                                How to convert a table to a data frame
                            
                                session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium
                            
                                Access lapply index names inside FUN
                            
                                Does the ternary operator exist in R?
                            
                                Convert data.frame column to a vector?
                            
                                Change size of axes title and labels in ggplot2
                            
                                Fixing a multiple warning "unknown column"
                            
                                Error: could not find function ... in R
                            
                                Relative frequencies / proportions with dplyr
                            
                                How can I handle R CMD check "no visible binding for global variable" notes when my ggplot2 syntax is sensible?
                            
                                How to prevent ifelse() from turning Date objects into numeric objects
                            
                                How to suppress warnings globally in an R Script
                            
                                Reasons for using the set.seed function
                            
                                How to use R's ellipsis feature when writing your own function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to assign colors to categorical variables in ggplot2 that have stable mapping?

Tags:

r

ggplot2

People also ask

Recent Activity

Donate For Us