Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show bigger dots when same values are plotted

Tags:

r

When I plot the following example:

Participant <- c(1:12)
AnswersDay1 <- c(9,3,9,13,7,12,10,7,9,0,12,11)
Day1Group   <- c(0,1,0,1, 0, 1, 0,1,0,1, 0, 1)


PushFrame <- data.frame(Participant, AnswersDay1, Day1Group)
plot(AnswersDay1, Day1Group)

The plot shows only ten dots instead of the 12 values in the data.frame. I figured out that this is due to the fact, that there are three pairs with the exact same value.

Is it possible to somehow illustrate this inside the plot? Maybe that bigger dots are used when they have the same value or something like this?

like image 565
Peter Piper Avatar asked Dec 09 '22 00:12

Peter Piper


2 Answers

1) sunflowerplot You may prefer to use a sunflowerplot which shows duplicate points as a single point with a spoke for each occurrence. No packages needed.

sunflowerplot(AnswersDay1, Day1Group)

(continued after graph)

screenshot

2) jitter The other common technique is to use jitter which slightly moves duplicate points. In this example we jitter the Y variable but one could alternately jitter the X variable or both. No packages needed.

set.seed(123) # set seed of random number generator for reproducibility
plot(AnswersDay1, jitter(Day1Group))

(continued after graph)

screenshot

3) cex If you really do want to use size as an indicator of how many duplicates then create a new data frame which contains the number of duplicates of each point (in the Participant column of ag) and then plot as shown. Again, no packages needed.

ag <- aggregate(Participant ~., PushFrame, length)
plot(Day1Group ~ AnswersDay1, ag, cex = Participant, pch = 20)

screenshot

like image 166
G. Grothendieck Avatar answered Dec 10 '22 13:12

G. Grothendieck


Yes, there’s absolutely a way of doing this: set the cex appropriately:

plot(AnswersDay1, Day1Group, cex = point_size)

How do you get the point size corresponding to each entry? Well, you count them using table:

tab = table(AnswersDay1, Day1Group)

This is what tab looks like:

           Day1Group
AnswersDay1 0 1
         0  0 1
         3  0 1
         7  1 1
         9  3 0
         10 1 0
         11 0 1
         12 1 1
         13 0 1

That is, for each data point in AnswersDay1 it tells you how often that point appears. Now you just need to index it using AnswersDay1 and Day1Group:

point_size = diag(tab[as.character(AnswersDay1), as.character(Day1Group)])

Note the as.character — this is necessary since the names in the table are character strings, and using numeric indices here would index the wrong element. diag gives us back just the diagonal of the resulting matrix, which is what we’re after here.

like image 26
Konrad Rudolph Avatar answered Dec 10 '22 13:12

Konrad Rudolph