Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I generate a histogram for each column of my table?

Tags:

r

ggplot2

I have a table of data with a column representing a lab value for each study subject (rows).

I want to generate a series of histograms showing the distribution of values for each lab test (i.e. column). Each set of lab values would ideally have a different bin width (some are integers with a range of hundreds, some are numeric with a range of 2-3).

How do I do that?

like image 314
veldhoen Avatar asked Feb 12 '16 21:02

veldhoen


2 Answers

If you combine the tidyr and ggplot2 packages, you can use facet_wrap to make a quick set of histograms of each variable in your data.frame.

You need to reshape your data to long form with tidyr::gather, so you have key and value columns like such:

library(tidyr)
library(ggplot2)
# or `library(tidyverse)`

mtcars %>% gather() %>% head()
#>   key value
#> 1 mpg  21.0
#> 2 mpg  21.0
#> 3 mpg  22.8
#> 4 mpg  21.4
#> 5 mpg  18.7
#> 6 mpg  18.1

Using this as our data, we can map value as our x variable, and use facet_wrap to separate by the key column:

ggplot(gather(mtcars), aes(value)) + 
    geom_histogram(bins = 10) + 
    facet_wrap(~key, scales = 'free_x')

The scales = 'free_x' is necessary unless your data is all of a similar scale.

You can replace bins = 10 with anything that evaluates to a number, which may allow you to set them somewhat individually with some creativity. Alternatively, you can set binwidth, which may be more practical, depending on what your data looks like. Regardless, binning will take some finesse.

like image 109
alistaire Avatar answered Oct 01 '22 01:10

alistaire


You could generate the plots in a for loop with something like this, if your data frame is named "df" and you want to generate histograms starting with column 2 (if column 1 is your id):

for (col in 2:ncol(df)) {
    hist(df[,col])
}

The hist function automatically calculates a reasonable bin width, or you can specify a fixed number of bins for all histograms, by adding the breaks argument:

hist(df[,col], breaks=10)

If you use RStudio, all your plots will be automatically be saved in the plots pane. If not, you will need to save each plot to a separate file inside the loop, as explained here: http://www.r-bloggers.com/automatically-save-your-plots-to-a-folder/

like image 22
KTWillow Avatar answered Oct 01 '22 01:10

KTWillow