I have a graph with 12 variables divided into two groups. I can't use facets, but using colour and shape, I have been able to make the visualization easy to understand. However, there are some points that overlap (partially or wholly). I am using jitter to deal with these, but as you can see from the attached graph, this leads to all points being moved around, not just those with overlap.
Is there a way to use jitter or dodge conditionally? Even better, is there a way to put the partially overlapping points side-by-side? As you can see, my x-axis is discrete categories, and a slight shift to left/right won't matter. I tried using dotplot with binaxis='y'
, but that completely spoils the x-axis.
Edit: This graph has managed to do exactly what I am searching for.
Further edit: Adding the code behind this visualization.
disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")
# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)
# involved aspects
intensive <- c( 0.660, 0.438, 0.515, 0.028, 0.443)
comparative <- c( 0.361, 0.928, 0.270, 0.285, 0.311)
wh_adverbs <- c( 0.431, 0.454, 0.069, 0.330, 0.577)
past_tense <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928, 0.207, 0.162, -0.299, -0.045)
personal <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns <- c(-0.602, -0.519, -0.388, -0.244, -0.196)
df1 <- data.frame(disciplines,
"Intensive Adverbs"=intensive,
"Comparative Adverbs"=comparative,
"Wh-adverbs (WRB)"=wh_adverbs,
"Verb: Past Tense"=past_tense,
"Verb: Present Tense"=present_tense,
"Conjunctions"=conjunctions,
"Personal Pronouns"=personal,
"Interrogative Pronouns"=interrogative,
"Subjective/Objective Pronouns"=sbj_objective,
"Possessive Pronouns"=possessive,
"3rd-person verbs"=thrd_person,
"Nouns"=nouns,
check.names=F)
df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4, alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))
I am curious what others might suggest, but to get the side-by-side effect, you could code the major x-axis categories as numbers (10, 20,..50) plus/minus a small amount like (0..10)/2 based on the categories you are using for color. So you could get the x-axis as 9.6, 9.8, 10.0, 10.2 ... and then 20.0, 20.2, 20.4. This could create an organized plot instead of assigning those fractional adjustments randomly.
Here is a quick implementation of that idea for your data-set. It offsets the main x variable disciplines
by one sixth of the sub-category variable
and uses that without jitter for the x value...
M = df1.m
ScaleFactor = 6
xadj = as.numeric(M$variable)/ScaleFactor
xadj = xadj - mean(xadj) # shift it to center around zero
x10 = as.numeric(M$disciplines) * 10
M$x = x10 + xadj
g = ggplot(M, aes(group=grp, x, value, shape=grp, colour=variable))
g + geom_point(size=4,alpha=I(0.7)) + scale_x_discrete(breaks=x10,labels=disciplines)
Note that the values within each category occur evenly spaced across and in the same order. (This code doesn't include all the curve fitting, etc that is shown in the figure).
Variation: You can see the effect even more clearly if you "quantize" your y values, so more of them plot side by side.
M$valmod = M$value - M$value %% 0.2 + .1
Then use valmod
in place of value
in the aes()
statement to see the effect.
To get the category labels back, manually set with scale_x_discrete
. This version uses a different ScaleFactor
for broader spacing and the quantized y axis:
M=df1.m
ScaleFactor = 3
# Note this could just be xadj instead of adding to data frame
M$xadj = as.numeric(M$variable)/ScaleFactor
M$xadj = M$xadj - mean(M$xadj) # shift it to center around zero
M$x10 = as.numeric(M$disciplines) * 10
M$x = M$x10 + M$xadj
Qfact = 0.2 # resolution to quantize y values
M$valmod = M$value - M$value %% Qfact + Qfact/2 # clump y to given resolution
g = ggplot(M, aes(group=grp, x, valmod, shape=grp, colour=variable)) +
scale_x_discrete(limits = M$x10, breaks=unique(M$x10),labels=levels(M$disciplines))
g + geom_point(size=3,alpha=I(0.7))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With