I'm working with categorical data and I'm trying to plot a scatterplot where the size of the points should represent the frequencies on the location of that point.
I tried it first with jitter but I'm unhappy with that solution.
I thought I could create a Frequencies column but didn't manage to create a code for that.
qplot(X, Y, data=datatable, geom=c("point"))
Has anyone an idea?
thx
The data points or dots, which appear on a scatter plot, represent the individual values of each of those data points and also allow pattern identification when looking at the data holistically.
Use a scatter plot to determine whether or not two variables have a relationship or correlation. Are you trying to see if your two variables might mean something when put together? Plotting a scattergram with your data points can help you to determine whether there's a potential relationship between them.
Here's a guess at what you're after. In the df
data frame below, x
and y
are your categorical variables. There are various ways to get the frequency counts. Here, the ddply()
function from the plyr
package is used. Followed by the plot. In the call to ggplot
: the size
aesthetic ensures that the point sizes represent the frequencies; and the scale_size_discrete()
function controls the size of the points on the plot.
# Some toy data
df <- structure(list(x = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor"), y = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3",
"4", "5"), class = "factor")), .Names = c("x", "y"), row.names = c(NA,
79L), class = "data.frame")
# Required packages
library(plyr)
library(ggplot2)
# Get the frequency counts
dfc <- ddply(df, c("x", "y"), "nrow", .drop = FALSE)
#dfc
# The plot
ggplot(data = dfc, aes(x = x, y = y, size = factor(nrow))) +
geom_point() +
scale_size_discrete(range = c(1, 10))
Or the same plot using the df
data frame - the unaggregated data.
ggplot(data = df, aes(x = x, y = y)) +
stat_sum(aes(size = factor(..n..)), geom = "point") +
scale_size_discrete(range = c(1, 10))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With