Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: colour points by groups based on user defined colours

I am trying to define the colours of groups of points plotted in ggplot. I adapted code from this post:

Color ggplot points based on defined color codes

but as soon as I have more than one row defined by the same grouping variable (rather than a separate colour for each row), the code fails, and I can't figure out why. Below is a reproducible example:

#create some data
zone  <- c("E","E","C","C","C","E","E") #grouping variable
col <- c(50,100,150,200,250,300,350) #x variable
D <- c(.4,.45,.20,.22,.30,.31,.35) #y variable
df1 <- data.frame(zone, D, col); df1

#create a colour scheme based on grouping variable 'zone'
zone <-c("E","C")
color.codes<-as.character(c("#3399FF", "#FF0000"))
color.names<-c("blue", "red")
df2=data.frame(zone, color.codes, color.names); df2

# merge color specifications with data
df <-merge(df1,df2, by=("zone"), all.x=TRUE, all.y=TRUE); df 

The data then look like this:

zone    D   col color.codes color.names
C     0.20  150     #FF0000         red
C     0.22  200     #FF0000         red
C     0.30  250     #FF0000         red
E     0.40   50     #3399FF        blue
E     0.45  100     #3399FF        blue
E     0.31  300     #3399FF        blue
E     0.35  350     #3399FF        blue

The goal is to produce a plot where points in zone 'C' are red and those in 'E' are blue, but using the code from the example cited everything is plotted in red:

p <- ggplot(data=df, aes(col, D, colour = zone))+ 
  geom_point() 
p + scale_colour_manual(breaks = df$zone, values = df$color.codes)

Can anyone see the fatal flaw, why this code won't work across groups in this way?
Thanks so much in advance

like image 434
user3267282 Avatar asked Feb 03 '14 20:02

user3267282


People also ask

How do I specify colors in ggplot2?

A color can be specified either by name (e.g.: “red”) or by hexadecimal code (e.g. : “#FF1234”).

What does Geom_point mean?

geom_point.Rd. The point geom is used to create scatterplots. The scatterplot is most useful for displaying the relationship between two continuous variables.

How do I change the color of a geom point?

To color the points in a scatterplot using ggplot2, we can use colour argument inside geom_point with aes. The color can be passed in multiple ways, one such way is to name the particular color and the other way is to giving a range or using a variable.


1 Answers

You are somewhere between two different solutions.

One approach is to not put the colors into the df data frame and specify the mapping between zone and desired color in the scale call:

ggplot(data=df, aes(col, D, colour = zone))+ 
  geom_point() +
  scale_colour_manual(values=setNames(color.codes, zone))

enter image description here

Note that this does not use color.codes or color.names from df, nor does it use df2 directly (though it does use the columns that are used to make df2; if you have something like df2 and not the columns separately, you can use setNames(df2$color.codes, df2$zone) instead).

The other approach maps color directly to the color codes and uses scale_color_identity, but then has to go through some additional labeling to get the legend right.

ggplot(data=df, aes(col, D, colour = color.codes)) +
  geom_point() +
  scale_colour_identity("zone", breaks=color.codes, labels=zone, guide="legend")

enter image description here

The first is, in my opinion, the better solution.

like image 195
Brian Diggs Avatar answered Oct 18 '22 08:10

Brian Diggs