Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2: Why symbol sizes differ when 'size' is including inside vs outside aes statement?

Tags:

r

ggplot2

I have created quite a few maps using base-R but I am now trying to perform similar tasks using ggplot2 due to the ease by which multiple plots can be arranged on a single page. Basically, I am plotting the locations at which samples of a particular species of interest have been collected and want the symbol size to reflect the total weight of the species collected at that location. Creating the base map and various layers has not been an issue but I'm having trouble getting the symbol sizes and associated legend the way I want them.

The problem is demonstrated in the workable example below. When I include 'size' outside of aes, the symbol sizes appear to be scaled appropriately (plot1). But when I put 'size' inside the aes statement (in order to get a legend) the symbol sizes are no longer correct (plot2). It looks like ggplot2 has rescaled the data. This should be a simple task so I am clearly missing something very basic. Any help understanding this would be appreciated.

library(ggplot2)

#create a very simple dataset that includes locations and total weight of samples collected from each site
catch.data<-data.frame(long=c(-50,-52.5,-52,-54,-53.8,-52),
                       lat=c(48,54,54,55,52,50),
                       wt=c(2,38,3,4,25,122))

#including 'size' outside of aes results in no legend
#but the symbol sizes are represented correctly
plot1<-ggplot(catch.data,aes(x=long,y=lat)) +
  geom_point(size=catch.data$wt,colour="white",fill="blue",shape=21)    

#including 'size' within aes appears necessary in order to create a legend
#but the symbol sizes are not represented correctly
plot2<-ggplot(catch.data,aes(x=long,y=lat)) +
  geom_point(aes(size=catch.data$wt),colour="white",fill="blue",shape=21)

enter image description here

like image 366
Turbo74 Avatar asked Oct 17 '22 07:10

Turbo74


2 Answers

First, you shouldn't reference the data frame name inside of aes, it messed the legend up. So the correct version will be

plot3 <- ggplot(catch.data,aes(x=long,y=lat)) + 
         geom_point(aes(size=wt),colour="white",fill="blue",shape=21)

Now in order to demonstrate variety you should play around with the range argument of scale_size_continuous, e.g.

plot3 + scale_size_continuous(range = range(catch.data$wt) / 5)

enter image description here

Change it a few times and see which one works for you. Please note that there exists a common visualization pitfall of representing numbers as areas (google e.g. "why pie charts are bad").

Edit: answering the comment below, you could introduce a fixed scaling by e.g. scale_size_continuous(limits = c(1, 200), range = c(1, 20)).

like image 162
tonytonov Avatar answered Nov 15 '22 07:11

tonytonov


Any value within the aes() is mapped to the variables in the data, while that is not the case for values specified outside the aes()

Refer to Difference between passing options in aes() and outside of it in ggplot2

Also the documentation : http://ggplot2.tidyverse.org/reference/aes.html

like image 25
Megha John Avatar answered Nov 15 '22 07:11

Megha John