Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot of categorical data using R

I've a list of protein names(P1,P2,...,Pn) and they are categorized to three different expression levels High(H), medium(M) and Low(L) as measured in three experimental conditions (Exp1,Exp2, and Exp3). enter image description here

I wish to make a plot as shown in the bottom part of the figure, with the name of the proteins at the left and name of experiments along the top and high, medium and low categories are indicated by Red,blue and green respectively.

I'm new to R, I would much appreciate any help.

Thanks in advance

like image 950
WoA Avatar asked Apr 22 '11 22:04

WoA


People also ask

How do you graph categorical data?

To graph categorical data, one uses bar charts and pie charts. Bar chart: Bar charts use rectangular bars to plot qualitative data against its quantity. Pie chart: Pie charts are circular graphs in which various slices have different arc lengths depending on its quantity.

How do I plot two categorical variables in R?

The categorical variables can be easily visualized with the help of mosaic plot. In a mosaic plot, we can have one or more categorical variables and the plot is created based on the frequency of each category in the variables. To create a mosaic plot in base R, we can use mosaicplot function.

What plot can we use for categorical variables?

Categorical Scatter Plots Both strip plots and swarm plots are essentially scatter plots where one variable is categorical. I like to use them as additions to other kinds of plots, which we'll discuss below as they are useful for quickly visualizing the number of data points in a group.

What is the best way to graph categorical data?

With categorical or discrete data a bar chart is typically your best option. A bar chart places the separate values of the data on the x-axis and the height of the bar indicates the count of that category.


2 Answers

You can create a file with data formatted like this (tab delimited):

pv   exp  val
1    1    H
2    1    L
3    1    L
4    1    M
1    2    H
2    2    H
3    2    M
4    2    H
1    3    L
2    3    L
3    3    L
4    3    M

And used the following commands to grab and plot them:

mat <- read.table(file.choose(),header=T) # read the file into memory

attach(mat) # map the header names to variable names

plot(pv~exp,col=val) # plot the categories against each other and use val (H,M,L) as the color array. R will assign those values to colors on its own. You can also create a color array using the val array to translate (H,M,L) to (Blue,Red,Green)... but there is other documentation out there for that.

like image 114
Damian Avatar answered Nov 01 '22 10:11

Damian


Here is an approach that uses some of the magic of the ggplot2 and reshape2 packages.

First, recreate the data in the format you described:

df <- data.frame(
    P    = paste("P", 1:4, sep=""),
    Exp1 = c("L", "H", "L", "M"),
    Exp2 = c("M", "M", "L", "H"),
    Exp3 = c("H", "L", "L", "M"))

Next, load the add-on packages:

library(reshape2)
library(ggplot2)

Then, use melt() to convert your data from wide format to tall format. The id variable is "P", and we tell the function to rename the "variable" to "Exp":

mdf <- melt(df, id.vars="P", variable="Exp")

Because L - M - H has semantic order, we use the ordered parameter of factor() to inform R of this order:

mdf$value <- factor(mdf$value, levels=c("H", "M", "L"), ordered=TRUE)

Finally, we are ready to plot your data:

ggplot(mdf, aes(x=Exp, y=P, colour=value)) + 
    geom_point(size=3) + 
    scale_colour_manual(value=c("red", "green", "blue")) +
    xlab("") + 
    ylab("")

enter image description here

like image 44
Andrie Avatar answered Nov 01 '22 08:11

Andrie