Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How do use different points sizes to represent the amount in the location of that point




I'm working with categorical data and I'm trying to plot a scatterplot where the size of the points should represent the frequencies on the location of that point.

I tried it first with jitter but I'm unhappy with that solution.

I thought I could create a Frequencies column but didn't manage to create a code for that.

    qplot(X, Y, data=datatable, geom=c("point"))

Has anyone an idea?


like image 228
Ventrue12 Avatar asked May 11 '12 13:05


People also ask

What are the dots on a scatter plot called?

The data points or dots, which appear on a scatter plot, represent the individual values of each of those data points and also allow pattern identification when looking at the data holistically.

What are scatter plots used for?

Use a scatter plot to determine whether or not two variables have a relationship or correlation. Are you trying to see if your two variables might mean something when put together? Plotting a scattergram with your data points can help you to determine whether there's a potential relationship between them.

1 Answers

Here's a guess at what you're after. In the df data frame below, x and y are your categorical variables. There are various ways to get the frequency counts. Here, the ddply() function from the plyr package is used. Followed by the plot. In the call to ggplot: the size aesthetic ensures that the point sizes represent the frequencies; and the scale_size_discrete() function controls the size of the points on the plot.

# Some toy data
df <- structure(list(x = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor"), y = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 5L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", 
"4", "5"), class = "factor")), .Names = c("x", "y"), row.names = c(NA, 
79L), class = "data.frame")

# Required packages

# Get the frequency counts
dfc <- ddply(df, c("x", "y"), "nrow", .drop = FALSE)

# The plot
ggplot(data = dfc, aes(x = x, y = y, size = factor(nrow))) + 
    geom_point() + 
    scale_size_discrete(range = c(1, 10))

enter image description here

Or the same plot using the df data frame - the unaggregated data.

ggplot(data = df, aes(x = x, y = y)) +
  stat_sum(aes(size = factor(..n..)), geom = "point") +
  scale_size_discrete(range = c(1, 10))
like image 184
Sandy Muspratt Avatar answered Sep 28 '22 08:09

Sandy Muspratt