I've a list of protein names(P1,P2,...,Pn) and they are categorized to three different expression levels High(H), medium(M) and Low(L) as measured in three experimental conditions (Exp1,Exp2, and Exp3).
I wish to make a plot as shown in the bottom part of the figure, with the name of the proteins at the left and name of experiments along the top and high, medium and low categories are indicated by Red,blue and green respectively.
I'm new to R, I would much appreciate any help.
Thanks in advance
To graph categorical data, one uses bar charts and pie charts. Bar chart: Bar charts use rectangular bars to plot qualitative data against its quantity. Pie chart: Pie charts are circular graphs in which various slices have different arc lengths depending on its quantity.
The categorical variables can be easily visualized with the help of mosaic plot. In a mosaic plot, we can have one or more categorical variables and the plot is created based on the frequency of each category in the variables. To create a mosaic plot in base R, we can use mosaicplot function.
Categorical Scatter Plots Both strip plots and swarm plots are essentially scatter plots where one variable is categorical. I like to use them as additions to other kinds of plots, which we'll discuss below as they are useful for quickly visualizing the number of data points in a group.
With categorical or discrete data a bar chart is typically your best option. A bar chart places the separate values of the data on the x-axis and the height of the bar indicates the count of that category.
You can create a file with data formatted like this (tab delimited):
pv exp val
1 1 H
2 1 L
3 1 L
4 1 M
1 2 H
2 2 H
3 2 M
4 2 H
1 3 L
2 3 L
3 3 L
4 3 M
And used the following commands to grab and plot them:
mat <- read.table(file.choose(),header=T)
# read the file into memory
attach(mat)
# map the header names to variable names
plot(pv~exp,col=val)
# plot the categories against each other and use val (H,M,L)
as the color array. R will assign those values to colors on its own. You can also create a color array using the val array to translate (H,M,L) to (Blue,Red,Green)... but there is other documentation out there for that.
Here is an approach that uses some of the magic of the ggplot2
and reshape2
packages.
First, recreate the data in the format you described:
df <- data.frame(
P = paste("P", 1:4, sep=""),
Exp1 = c("L", "H", "L", "M"),
Exp2 = c("M", "M", "L", "H"),
Exp3 = c("H", "L", "L", "M"))
Next, load the add-on packages:
library(reshape2)
library(ggplot2)
Then, use melt()
to convert your data from wide format to tall format. The id variable is "P", and we tell the function to rename the "variable" to "Exp":
mdf <- melt(df, id.vars="P", variable="Exp")
Because L - M - H has semantic order, we use the ordered
parameter of factor()
to inform R of this order:
mdf$value <- factor(mdf$value, levels=c("H", "M", "L"), ordered=TRUE)
Finally, we are ready to plot your data:
ggplot(mdf, aes(x=Exp, y=P, colour=value)) +
geom_point(size=3) +
scale_colour_manual(value=c("red", "green", "blue")) +
xlab("") +
ylab("")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With