Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Bin" continuous values in ggplot2 based on criteria to obtain more distinct colours (like factor level coloring)?

Tags:

r

ggplot2

For now, I'm just using something like this:

test_data$level <- rep("", nrow(test_data))
test_data[test_data$value <= 1, ]$level <- "1"
test_data[test_data$value > 1 & test_data$value <= 2, ]$level <- "2"
...
test_data[test_data$value > 4 & test_data$value <= 5, ]$level <- "5"

Just wondering if there's a better way to do this in R, or a way to simply apply some scale argument via ggplot2 to do the categorizing.


There could be a couple of approaches to this, so it was hard to phrase my question exactly. Here's the gist... I have data something like so:

 set.seed(123)
 test_data <- data.frame(var1 = rep(LETTERS[1:3], each = 5),
   var2 = rep(letters[1:5], 3),
   value = runif(30, 1, 5))
 test_data
   var1    value
1     A 2.150310
2     B 4.153221
3     C 2.635908
4     D 4.532070
5     E 4.761869
6     F 1.182226
7     G 3.112422
8     H 4.569676
9     I 3.205740
10    J 2.826459

I have a lot more data points, and am plotting something like this:

library(ggplot2)
p <- ggplot(test_data, aes(x = var1, y = var2, colour = value))
p <- p + geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1))
p

Which gives something like so:

enter image description here

My actual data is from a subjective evaluation with 1-5 ratings, but I've bundled similar questions together and averaged them together so they're no longer integers.

I'm plotting the ratings per factor combination to visualize which combinations yielded higher ratings. The default continuous scale doesn't really "pop" and I'd like to get the color scale to treat "bins" of these values (0-1, 1-2, ... 4-5) to be colored like scale_colour_discrete does for factors.

So, my question(s):

1) Is it possible with ggplot2 to "bin" these somehow via scale_colour_continuous so I can get the default factor level coloring scheme to apply even though this is continuous data?

2) If not, is there an easier way to create a new vector where I substitute numbers/letters for my values based on criteria? I'm a bit of an R novice, so I wasn't sure except a bunch of if() or conditional statements (test_data[test_data > 0 & test_data < 1, "values"] <- "a" or something like that).

like image 513
Hendy Avatar asked Feb 18 '23 14:02

Hendy


2 Answers

The easiest solution is to do

ggplot(transform(test_data, Discrete=cut(values, seq(0,5,1), include.lowest=T),...

Now your data.frame will include a column of factors based on the column values, so you can do aes(..., color=Discrete,...) JUST in the context of your ggplot. The format of test_data will be preserved once you are done plotting.

To keep a discrete column, of course, your best option is:

test_data$Discrete <- cut(values, seq(0,5,1), include.lowest=T)
like image 72
Señor O Avatar answered May 01 '23 00:05

Señor O


You can switch from the colour bar legend to the discrete-style legend.

library(RColorBrewer) # for brewer.pal
ggplot(test_data, aes(x = var1, y = var2, colour = value)) +
   geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1)) + 
    scale_colour_gradientn(guide = 'legend', colours = brewer.pal(n = 5, name = 'Set1'))

enter image description here

like image 32
mnel Avatar answered May 01 '23 01:05

mnel