I have a dataset with 6000 columns which are gene names. It has 6 different features A,B,C,D,E,F with unique position numbers. I want to divide columns with feature A by 87 and feature B by 54.
At the end I want to have the sum and average of each row in new columns. How I can do this in R?
feature_A=87
feature_B=54
Input file
feature pos gene_1 gene_2 gene_3 gene_n
A 1 6 2 51 0
A 2 4 5 8 2
A 3 1 74 5 0
B 1 11 2 41 89
B 2 4 5 3 5
Output file
feature pos gene_1 gene_2 gene_3 gene_n sum_all average_all
A 1 6/87 2/87 51/87 0/87 sum_row1 average_row1
A 2 4/87 5/87 8/87 2/87 sum_row2 average_row2
A 3 1/87 74/87 5/87 0/87 sum_row3 average_row3
B 1 11/54 2/54 41/54 89/54 sum_row4 average_row4
B 2 4/54 5/54 3/54 5/54 sum_row5 average_row5
B 3 4/54 0/54 5/54 21/54 sum_row6 average_row6
This might be made easier by merging your divisors in to your main dataset:
feat_div <- data.frame(feature=c("A","B"), value=c(87,54))
# feature value
#1 A 87
#2 B 54
cols <- grepl("^gene_", names(dat))
dat <- merge(dat, feat_div)
dat[cols] <- lapply(dat[cols], `/`, dat$value)
dat$sum_all <- rowSums(dat[cols])
dat$average_all <- rowMeans(dat[cols])
# feature pos gene_1 gene_2 gene_3 gene_n value sum_all average_all
#1 A 1 0.06896552 0.02298851 0.58620690 0.00000000 87 0.6781609 0.1695402
#2 A 2 0.04597701 0.05747126 0.09195402 0.02298851 87 0.2183908 0.0545977
#3 A 3 0.01149425 0.85057471 0.05747126 0.00000000 87 0.9195402 0.2298851
#4 B 1 0.20370370 0.03703704 0.75925926 1.64814815 54 2.6481481 0.6620370
#5 B 2 0.07407407 0.09259259 0.05555556 0.09259259 54 0.3148148 0.0787037
dplyr
can do what you want, if you use a lookup table or the like to index what number to divide by:
library(dplyr)
# make a lookup vector
feat_num <- c(A = 87, B = 54)
feat_num
##
## A B
## 87 54
# group by feature and pos so they don't get divided
df %>% group_by(feature, pos) %>%
# divide everything but grouping variables (.) by the number looked up from feat_num
mutate_each(funs(. / feat_num[feature])) %>%
# ungroup so next mutate works nicely
ungroup() %>%
# add row sum and mean columns, indexing out the first and second columns
mutate(sum_all = rowSums(.[-1:-2]),
average_all = rowMeans(.[-1:-2]))
##
## Source: local data frame [5 x 8]
##
## feature pos gene_1 gene_2 gene_3 gene_n sum_all average_all
## (fctr) (int) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1 A 1 0.06896552 0.02298851 0.58620690 0.00000000 0.6781609 0.1695402
## 2 A 2 0.04597701 0.05747126 0.09195402 0.02298851 0.2183908 0.0545977
## 3 A 3 0.01149425 0.85057471 0.05747126 0.00000000 0.9195402 0.2298851
## 4 B 1 0.20370370 0.03703704 0.75925926 1.64814815 2.6481481 0.6620370
## 5 B 2 0.07407407 0.09259259 0.05555556 0.09259259 0.3148148 0.0787037
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With