Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot density curves for each column in R?

I have a data frame w like this:

>head(w,3)
         V1        V2         V3        V4 V5        V6         V7        V8        V9       V10 V11        V12        V13        V14
1 0.2446884 0.3173719 0.74258410 0.0000000  0 0.0000000 0.01962759 0.0000000 0.0000000 0.5995647   0 0.30201691 0.03109935 0.16897571
2 0.0000000 0.0000000 0.08592243 0.2254971  0 0.7381867 0.11936323 0.2076167 0.0000000 1.0587742   0 0.50226734 0.51295661 0.01298853
3 8.4293893 4.9985040 2.22526463 0.0000000  0 3.6600283 0.00000000 0.0000000 0.2573714 0.8069288   0 0.05074886 0.00000000 0.59403855
         V15       V16      V17       V18      V19       V20       V21      V22         V23        V24       V25       V26       V27
1 0.00000000 0.0000000 0.000000 0.1250837 0.000000 0.5468143 0.3503245 0.000000 0.183144204 0.23026538 6.9868429 1.5774150 0.0000000
2 0.01732732 0.8064441 0.000000 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.015123385 0.07580794 0.6160713 0.7452335 0.0740328
3 2.66846151 0.0000000 1.453987 0.0000000 1.875298 0.0000000 0.0000000 0.893363 0.004249061 0.00000000 1.6185897 0.0000000 0.7792773
        V28 V29     V30       V31        V32        V33       V34       V35 V36        V37        V38       V39        V40    refseq
1 0.5543028   0 0.00000 0.0000000 0.08293075 0.18261450 0.3211127 0.2765295   0 0.04230929 0.05017316 0.3340662 0.00000000 NM_000014
2 0.0000000   0 0.00000 0.0000000 0.00000000 0.03531411 0.0000000 0.4143325   0 0.14894716 0.58056304 0.3310173 0.09162460 NM_000015
3 0.8047882   0 0.88308 0.7207709 0.01574767 0.00000000 0.0000000 0.1183736   0 0.00000000 0.00000000 1.3529881 0.03720155 NM_000016

dim(w)
[1] 37126    41

I tried to plot the density curve of each column(except the last column) in one page. It seems that ggplot2 can do this.

I tried this according to this post:

ggplot(data=w[,-41], aes_string(x=colnames)) + geom_density()

But it doesn't work by complaining like this:

Error in as.character(x) : 
  cannot coerce type 'closure' to vector of type 'character'

And I'm not sure how to convert the format of this dataframe to the one ggplot2 accepts. Or is there other way to do this job in R?

like image 924
Hanfei Sun Avatar asked Jun 04 '13 03:06

Hanfei Sun


People also ask

How do you plot a density curve in R?

To create a density plot in R you can plot the object created with the R density function, that will plot a density curve in a new R window. You can also overlay the density curve over an R histogram with the lines function. The result is the empirical density function.

How do I make a histogram for each column in R?

To create histogram of all columns in an R data frame, we can use hist. data. frame function of Hmisc package. For example, if we have a data frame df that contains five columns then the histogram for all the columns can be created by using a single line code as hist.

How do you make a histogram with two columns in R?

In this method, to create a histogram of two variables, the user has to first install and import the ggplot2 package, and then call the geom_histrogram with the specified parameters as per the requirements and needs to create the dataframe with the variable to which we need the histogram in the R programming language.

How do I add a density curve to a histogram in R?

In order to add a density curve over a histogram you can use the lines function for plotting the curve and density for calculating the underlying non-parametric (kernel) density of the distribution. The bandwidth selection for adjusting non-parametric densities is an area of intense research.


1 Answers

ggplot needs your data in a long format, like so:

variable  value
1 V1  0.24468840
2 V1  0.00000000
3 V1  8.42938930
4 V2  0.31737190

Once it's melted into a long data frame, you can group all the density plots by variable. In the snippet below, ggplot uses the w.plot data frame for plotting (which doesn't need to omit the final refseq variable). You can modify it to use facets, different colors, fills, etc.

w <- as.data.frame(cbind(
  c(0.2446884, 0.0000000, 8.4293893), 
  c(0.3173719, 0.0000000, 4.9985040), 
  c(0.74258410, 0.08592243, 2.22526463)))
w$refseq <- c("NM_000014", "NM_000015", "NM_000016")

library(ggplot2)
library(reshape2)
w.plot <- melt(w) 

p <- ggplot(aes(x=value, colour=variable), data=w.plot)
p + geom_density()

Example plot

like image 86
Andrew Avatar answered Sep 20 '22 17:09

Andrew