I am using the ..count..
transformation in geom_bar
and get the warning
position_stack requires non-overlapping x intervals when some of my categories have few counts.
This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)
#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20 #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions
#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)
# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()
This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts.
However more velocity classes leads to a warning. For instance, with
FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that
position_stack requires non-overlapping x intervals
and the plot will show data in this category spread out along the x axis. It seems that 5 is the minimum size for a group to have for this to work correctly.
I would appreciate knowing if this is a feature or a bug in stat_bin
(which geom_bar
is using) or if I am simply abusing geom_bar
.
Also, any suggestions how to get around this would be appreciated.
Sincerely
There are two types of bar charts: geom_bar() and geom_col() . geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col() instead.
Dodging preserves the vertical position of an geom while adjusting the horizontal position. position_dodge() requires the grouping variable to be be specified in the global or geom_* layer.
Basically, geom_col is a wrapper over the geom_bar geometry, which has statically defined the statistical transformation to identity. This means that the values for positional parameters x and y are mapped directly to variables from the selected dataset.
To Increase or Decrease width of Bars of BarPlot, we simply assign one more width parameter to geom_bar() function. We can give values from 0.00 to 1.00 as per our requirements.
This occurs because df$dir
is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group
is based on the only known discrete variable (fill = grp
).
As a result, when there simply aren't that many dir
values in grp = [45,60)
, ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar() +
facet_wrap(~ grp)
> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1] 1 2 3 4 6 7 8 9 10
[1] 1 2 3 4 5 6 7 8 9 10
[1] 2 3 4 5 7 9 10
[1] 2 4 7
We can also check manually that the minimum difference between sorted df$dir
values is 1 for the first three grp
values, but 2 for the last one. The default bar width is thus wider.
The following solutions should all achieve the same result:
1. Explicitly specify the same bar width for all groups in geom_bar()
:
ggplot(data=df,
aes(x=dir,y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar(width = 0.9)
2. Convert dir
to a categorical variable before passing it to aes(x = ...)
:
ggplot(data=df,
aes(x=factor(dir), y=(..count..)/sum(..count..),
fill = grp)) +
geom_bar()
3. Specify that the group
parameter should be based on both df$dir
& df$grp
:
ggplot(data=df,
aes(x=dir,
y=(..count..)/sum(..count..),
group = interaction(dir, grp),
fill = grp)) +
geom_bar()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With