I'd like to create a ggplot2 histogram in which the plot's limits are equal to the smallest and largest values in the data set, without excluding those values from the actual histogram. I get the behavior I'm looking for when using base graphics. Specifically, the second histogram below shows all of the same values as the first histogram (i.e., no bins are excluded in the second histogram), even though I've included an <code>xlim</code> argument to the second plot: <pre class="prettyprint"><code>min_wt <- min(mtcars$wt) max_wt <- max(mtcars$wt) xlim <- c(min_wt, max_wt) hist(mtcars$wt, breaks = 30, main = "No limits added") hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added") </code></pre> <img src="https://i.stack.imgur.com/5HBXd.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/ExC0C.png" alt="enter image description here"> ggplot2 isn't giving me this behavior though: <pre class="prettyprint"><code>library(ggplot2) # Using green colour to make dropped bins easy to see: p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30) p + ggtitle("No limits added") p + xlim(xlim) + ggtitle("Limits added") </code></pre> <img src="https://i.stack.imgur.com/aauzZ.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/HkMrR.png" alt="enter image description here"> See how in the second plot I lose one of the points that is below 2 and 2 of the points that are above 5? I would like to know how to fix this. A few misc notes: First, specifying <code>boundary</code> allows me to include the minimum values (i.e., those below 2) in the histogram, but I still don't have a solution to the 2 values greater than 5 that are getting dropped: <pre class="prettyprint"><code>ggplot(mtcars, aes(x = wt)) + geom_histogram(bins = 30, colour = "green", boundary = min_wt) + xlim(xlim) + ggtitle("Limits added with boundary too") </code></pre> <img src="https://i.stack.imgur.com/6dI1A.png" alt="enter image description here"> Second, the presence of the issue is dependent on the value chosen for <code>bins</code>. For example, when I increase <code>bins</code> to be 50, I don't get any dropped values: <pre class="prettyprint"><code>ggplot(mtcars, aes(x = wt)) + geom_histogram(bins = 50, colour = "green", boundary = min_wt) + xlim(xlim) + ggtitle("Limits added with boundary too, but with bins = 50") </code></pre> <img src="https://i.stack.imgur.com/YIOOS.png" alt="enter image description here"> Finally, I believe this issue is related to the one presented on SO here: geom_histogram: wrong bins? and discussed here as well: https://github.com/tidyverse/ggplot2/issues/1651. In other words, I think this issue is related to a "rounding error." I describe this error in more depth in my second post (the one with the graphs shown in it) on this issue: https://github.com/daattali/ggExtra/issues/81. Here is my session info: <pre class="prettyprint"><code>R version 3.4.2 (2017-09-28) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] ggplot2_2.2.1 loaded via a namespace (and not attached): [1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000 [4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4 [7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0 [10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15 [13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3 </code></pre>

Another option to what was mentioned by @eipi10 in the comments, is to change the <code>oob</code> (out of bounds) argument in <code>scale_x_continuous</code>. <blockquote> Function that handles limits outside of the scale limits (out of bounds). The default replaces out of bounds values with NA. </blockquote> The default uses <code>scales::censor()</code>, you can change that to be <code>oob = scales::squish</code>, which squishes values into a range. Compare the following two plots. <pre class="prettyprint"><code>p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor") </code></pre> <blockquote> warning: Removed 1 rows containing missing values (geom_bar). </blockquote> <img src="https://i.stack.imgur.com/xAtyK.png" alt="enter image description here"> <pre class="prettyprint"><code>p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish") </code></pre> <img src="https://i.stack.imgur.com/tOpaB.png" alt="enter image description here"> Your third <code>ggplot</code>, where you specified a boundary but still 2 values greater than 5 got dropped would look like this. <pre class="prettyprint"><code>ggplot(mtcars, aes(x = wt)) + geom_histogram(bins = 30, colour = "green", boundary = min_wt) + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("Limits added with boundary too") + labs(subtitle = "scales::squish") </code></pre> <img src="https://i.stack.imgur.com/wlEeL.png" alt="enter image description here"> Hope this helps.

Values getting dropped from ggplot2 histogram when specifying limits

Tags:

r

ggplot2

I'd like to create a ggplot2 histogram in which the plot's limits are equal to the smallest and largest values in the data set, without excluding those values from the actual histogram.

I get the behavior I'm looking for when using base graphics. Specifically, the second histogram below shows all of the same values as the first histogram (i.e., no bins are excluded in the second histogram), even though I've included an xlim argument to the second plot:

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)

hist(mtcars$wt, breaks = 30, main = "No limits added")

hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")

enter image description here

ggplot2 isn't giving me this behavior though:

library(ggplot2)

# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")

p + xlim(xlim) + ggtitle("Limits added")

enter image description here

See how in the second plot I lose one of the points that is below 2 and 2 of the points that are above 5? I would like to know how to fix this. A few misc notes:

First, specifying boundary allows me to include the minimum values (i.e., those below 2) in the histogram, but I still don't have a solution to the 2 values greater than 5 that are getting dropped:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too")

enter image description here

Second, the presence of the issue is dependent on the value chosen for bins. For example, when I increase bins to be 50, I don't get any dropped values:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 50, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too, but with bins = 50")

enter image description here

Finally, I believe this issue is related to the one presented on SO here: geom_histogram: wrong bins? and discussed here as well: https://github.com/tidyverse/ggplot2/issues/1651. In other words, I think this issue is related to a "rounding error." I describe this error in more depth in my second post (the one with the graphs shown in it) on this issue: https://github.com/daattali/ggExtra/issues/81.

Here is my session info:

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] labeling_0.3      colorspace_1.3-2  scales_0.5.0.9000
 [4] compiler_3.4.2    lazyeval_0.2.1    plyr_1.8.4       
 [7] tools_3.4.2       pillar_1.2.1      gtable_0.2.0     
[10] tibble_1.4.2      yaml_2.1.16       Rcpp_0.12.15     
[13] grid_3.4.2        rlang_0.2.0.9000  munsell_0.4.3

684

asked Mar 10 '18 01:03

Chris

1 Answers

Another option to what was mentioned by @eipi10 in the comments, is to change the oob (out of bounds) argument in scale_x_continuous.

Function that handles limits outside of the scale limits (out of bounds). The default replaces out of bounds values with NA.

The default uses scales::censor(), you can change that to be oob = scales::squish, which squishes values into a range.

Compare the following two plots.

p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")

warning: Removed 1 rows containing missing values (geom_bar).

enter image description here

p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")

enter image description here

Your third ggplot, where you specified a boundary but still 2 values greater than 5 got dropped would look like this.

ggplot(mtcars, aes(x = wt)) + 
 geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
 scale_x_continuous(limits = xlim, oob = scales::squish) +
 ggtitle("Limits added with boundary too") +
 labs(subtitle = "scales::squish")

enter image description here

Hope this helps.

answered Oct 11 '22 14:10

markus

Related questions
                            
                                In R, why do I get one millisecond difference between POSIXct and POSIXlt?
                            
                                Flexdashboard/plotly interaction results in odd scroll bar behavior
                            
                                using knitr/rmarkdown to produce outputs in multiple natural languages (from a data.frame)
                            
                                Hide printing statement in RMarkdown
                            
                                Missing "libsystem_darwin.dylib" after installing Xcode 9
                            
                                Error in R: The h5py Python package is required to save and load models
                            
                                Euclidean distance matrix performance between two shapes
                            
                                Html table output formatting when sending email from Microsoft Outlook using R
                            
                                How to correctly add map to raster image in R
                            
                                cv.glmnet vs glmnet results; gauging explanatory power
                            
                                fread takes a lof of memory when "skip" is large
                            
                                Sort paragraph numbers
                            
                                how to filter rows between two specific values
                            
                                row-wise operations, select helpers and the mutate function in dplyr
                            
                                How to color-code the positive and negative bars in barplot using ggplot
                            
                                How can I speed up spatial operations in `dplyr::mutate()`?
                            
                                Keep rownames when converting matrix to data frame
                            
                                rmarkdown running child chunks from inside RStudio
                            
                                What's the best way to write custom JavaScript for R Shiny Module that uses module's namespace? [closed]
                            
                                Analyze and measure technical quality in R code: any tool similar to SonarQube?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Values getting dropped from ggplot2 histogram when specifying limits

Tags:

r

ggplot2

Chris

People also ask

1 Answers

markus

Recent Activity

Donate For Us