Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot geom_bar vs geom_histogram

Tags:

r

ggplot2

What is the difference (if any) between geom_bar and geom_histogram in ggplot? They seem to produce the same plot and take the same parameters.

like image 403
jamborta Avatar asked Jan 03 '13 11:01

jamborta


People also ask

What is the difference between Geom_col and Geom_bar?

There are two types of bar charts: geom_bar() and geom_col() . geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col() instead.

What is Geom_histogram in R?

geom_histogram.Rd. Visualise the distribution of a single continuous variable by dividing the x axis into bins and counting the number of observations in each bin. Histograms ( geom_histogram() ) display the counts with bars; frequency polygons ( geom_freqpoly() ) display the counts with lines.

What does Geom_col () do?

Basically, geom_col is a wrapper over the geom_bar geometry, which has statically defined the statistical transformation to identity. This means that the values for positional parameters x and y are mapped directly to variables from the selected dataset.

How do I change the bin size in Ggplot?

To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.


2 Answers

  • Bar charts provide a visual presentation of categorical data. Examples:
    • The number of people with red, black and brown hair
    • Look at the geom_bar help file. The examples are all counts.
    • Wikipedia page
  • Histograms are used to plot density of interval (usually numeric) data. Examples,
    • Distributions of age and height
    • geom_hist help file. The examples are distribution of movie ratings.

ggplot2

After a bit more investigating, I think in ggplot2 there is no difference between geom_bar and geom_histogram. From the docs:

 geom_histogram(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)
 geom_bar(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)

I realise that in the geom_histogram docs it states:

geom_histogram is an alias for geom_bar plus stat_bin

but to be honest, I'm not really sure what this means, since my understanding of ggplot2 is that both stat_bin and geom_bar are layers (with a slightly different emphasis).

like image 189
csgillespie Avatar answered Oct 02 '22 23:10

csgillespie


The default behavior is the same from both geom_bar and geom_histogram. This is because (and as @csgillespie mentioned), there is an implied stat_bin when you call geom_histogarm (understandable), and it is also the default statistics transformation applied to geom_bar (arguable behavior IMO). That's why you need to specify stat='identity' when you want the to plot the data as is.

The stat='bin' or stat_bin() is a statistical transformation that ggplot does for you. It provides you as output the variables surrounded with two dots (the ..count.. and ..density... If you don't specify stat='bin' you won't get those variables.

like image 20
yahiaelgamal Avatar answered Oct 02 '22 22:10

yahiaelgamal