Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 geom_bar position failure

I am using the ..count.. transformation in geom_bar and get the warning position_stack requires non-overlapping x intervals when some of my categories have few counts.

This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)

#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20  #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions

#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)

# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()

This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts. Three categories of size 20 each

However more velocity classes leads to a warning. For instance, with

FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
 

the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that

position_stack requires non-overlapping x intervals

and the plot will show data in this category spread out along the x axis. Four categories of size 15 each. Now the last one with three elements is not added on top of the corresponding bar It seems that 5 is the minimum size for a group to have for this to work correctly.

I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.

Also, any suggestions how to get around this would be appreciated.

Sincerely

like image 895
stuttungr Avatar asked May 30 '18 11:05

stuttungr


People also ask

What is the difference between Geom_col and Geom_bar?

There are two types of bar charts: geom_bar() and geom_col() . geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col() instead.

What does position Dodge do in ggplot2?

Dodging preserves the vertical position of an geom while adjusting the horizontal position. position_dodge() requires the grouping variable to be be specified in the global or geom_* layer.

What does Geom_col () do?

Basically, geom_col is a wrapper over the geom_bar geometry, which has statically defined the statistical transformation to identity. This means that the values for positional parameters x and y are mapped directly to variables from the selected dataset.

How do I change the width of a bar in ggplot2?

To Increase or Decrease width of Bars of BarPlot, we simply assign one more width parameter to geom_bar() function. We can give values from 0.00 to 1.00 as per our requirements.


1 Answers

This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).

As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:

ggplot(data=df,
            aes(x=dir,y=(..count..)/sum(..count..),
                fill = grp)) + 
  geom_bar() + 
  facet_wrap(~ grp)

facet view

> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1]  1  2  3  4  6  7  8  9 10
[1]  1  2  3  4  5  6  7  8  9 10
[1]  2  3  4  5  7  9 10
[1] 2 4 7

We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.

The following solutions should all achieve the same result:

1. Explicitly specify the same bar width for all groups in geom_bar():

ggplot(data=df,
       aes(x=dir,y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar(width = 0.9)

2. Convert dir to a categorical variable before passing it to aes(x = ...):

ggplot(data=df,
       aes(x=factor(dir), y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar()

3. Specify that the group parameter should be based on both df$dir & df$grp:

ggplot(data=df,
       aes(x=dir,
           y=(..count..)/sum(..count..),
           group = interaction(dir, grp),
           fill = grp)) + 
  geom_bar()

plot

like image 191
Z.Lin Avatar answered Oct 02 '22 06:10

Z.Lin