I am using the <code>..count..</code> transformation in <code>geom_bar</code> and get the warning position_stack requires non-overlapping x intervals when some of my categories have few counts. This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that) <pre class="prettyprint"><code>#make data set.seed(12345) FF=rweibull(100,1.7,1)*20 #mock speeds FF[FF>60]=59 dir=sample.int(10,size=100,replace=TRUE) # mock directions #group into speed classes FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE) # stuff into data frame & plot df=data.frame(dir=dir,grp=FFcut) ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar() </code></pre> This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts. <img src="https://i.stack.imgur.com/EqlLB.png" alt="Three categories of size 20 each"> However more velocity classes leads to a warning. For instance, with <pre class="prettyprint"><code>FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE) </code></pre> the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that position_stack requires non-overlapping x intervals and the plot will show data in this category spread out along the x axis. <img src="https://i.stack.imgur.com/tBHNa.png" alt="Four categories of size 15 each. Now the last one with three elements is not added on top of the corresponding bar"> It seems that 5 is the minimum size for a group to have for this to work correctly. I would appreciate knowing if this is a feature or a bug in <code>stat_bin</code> (which <code>geom_bar</code> is using) or if I am simply abusing <code>geom_bar</code>. Also, any suggestions how to get around this would be appreciated. Sincerely

This occurs because <code>df$dir</code> is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter <code>group</code> is based on the only known discrete variable (<code>fill = grp</code>). As a result, when there simply aren't that many <code>dir</code> values in <code>grp = [45,60)</code>, ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets: <pre class="prettyprint"><code>ggplot(data=df, aes(x=dir,y=(..count..)/sum(..count..), fill = grp)) + geom_bar() + facet_wrap(~ grp) </code></pre> <img src="https://i.stack.imgur.com/ggoVE.png" alt="facet view"> <pre class="prettyprint"><code>> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l]))) [1] 1 2 3 4 6 7 8 9 10 [1] 1 2 3 4 5 6 7 8 9 10 [1] 2 3 4 5 7 9 10 [1] 2 4 7 </code></pre> We can also check manually that the minimum difference between sorted <code>df$dir</code> values is 1 for the first three <code>grp</code> values, but 2 for the last one. The default bar width is thus wider. The following solutions should all achieve the same result: 1. Explicitly specify the same bar width for all groups in <code>geom_bar()</code>: <pre class="prettyprint"><code>ggplot(data=df, aes(x=dir,y=(..count..)/sum(..count..), fill = grp)) + geom_bar(width = 0.9) </code></pre> 2. Convert <code>dir</code> to a categorical variable before passing it to <code>aes(x = ...)</code>: <pre class="prettyprint"><code>ggplot(data=df, aes(x=factor(dir), y=(..count..)/sum(..count..), fill = grp)) + geom_bar() </code></pre> 3. Specify that the <code>group</code> parameter should be based on both <code>df$dir</code> & <code>df$grp</code>: <pre class="prettyprint"><code>ggplot(data=df, aes(x=dir, y=(..count..)/sum(..count..), group = interaction(dir, grp), fill = grp)) + geom_bar() </code></pre> <img src="https://i.stack.imgur.com/ymle0.png" alt="plot">

ggplot2 geom_bar position failure

Tags:

r

ggplot2

histogram

stacked

I am using the ..count.. transformation in geom_bar and get the warning position_stack requires non-overlapping x intervals when some of my categories have few counts.

This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)

Click to copy

#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20  #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions

#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)

# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()

This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts. Three categories of size 20 each

However more velocity classes leads to a warning. For instance, with

Click to copy

FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)

the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that

position_stack requires non-overlapping x intervals

and the plot will show data in this category spread out along the x axis. Four categories of size 15 each. Now the last one with three elements is not added on top of the corresponding bar It seems that 5 is the minimum size for a group to have for this to work correctly.

I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.

Also, any suggestions how to get around this would be appreciated.

Sincerely

895

asked May 30 '18 11:05

stuttungr

1 Answers

This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).

As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:

Click to copy

ggplot(data=df,
            aes(x=dir,y=(..count..)/sum(..count..),
                fill = grp)) + 
  geom_bar() + 
  facet_wrap(~ grp)

facet view

Click to copy

> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1]  1  2  3  4  6  7  8  9 10
[1]  1  2  3  4  5  6  7  8  9 10
[1]  2  3  4  5  7  9 10
[1] 2 4 7

We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.

The following solutions should all achieve the same result:

1. Explicitly specify the same bar width for all groups in geom_bar():

Click to copy

ggplot(data=df,
       aes(x=dir,y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar(width = 0.9)

2. Convert dir to a categorical variable before passing it to aes(x = ...):

Click to copy

ggplot(data=df,
       aes(x=factor(dir), y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar()

3. Specify that the group parameter should be based on both df$dir & df$grp:

Click to copy

ggplot(data=df,
       aes(x=dir,
           y=(..count..)/sum(..count..),
           group = interaction(dir, grp),
           fill = grp)) + 
  geom_bar()

plot

191

answered Oct 02 '22 06:10

Z.Lin

Related questions
                            
                                Running a Powershell script from R using system2() rather than system()
                            
                                ggplot tile line between cells
                            
                                How to sort a data.table using a target vector
                            
                                Image in R Leaflet marker popups
                            
                                How do I split a string with tidyr::separate in R and retain the values of the separator string?
                            
                                ggplot2: geom_ribbon with alpha dependent on data density along y-axis for each x
                            
                                How does geom_map "map_id" function work?
                            
                                R - Image Plot MNIST dataset
                            
                                R: How can I calculate large numbers in n-choose-k? [duplicate]
                            
                                install_github with --no-multiarch argument
                            
                                R: create dummy variables based on a categorical variable *of lists* [duplicate]
                            
                                Make legend invisible but keep figure dimensions and margins the same
                            
                                Parallel while loop in R
                            
                                Equalizing the lengths of all the lists within a list?
                            
                                Another rJava library install error: rJava.rdb' is corrupt
                            
                                Accessing variables in closure in R
                            
                                How to add a superscript or a subscript to an axis label to a 3D plot in plotly?
                            
                                Split geom_point points along x axis by group
                            
                                Compare Matrices in R efficiently
                            
                                converting .rda to pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ggplot2 geom_bar position failure

Tags:

r

ggplot2

histogram

stacked

stuttungr

People also ask

1 Answers

Z.Lin

Recent Activity

Donate For Us