Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can divide each column by a number in R?

Tags:

r

I have a dataset with 6000 columns which are gene names. It has 6 different features A,B,C,D,E,F with unique position numbers. I want to divide columns with feature A by 87 and feature B by 54.

At the end I want to have the sum and average of each row in new columns. How I can do this in R?

feature_A=87
feature_B=54

Input file

 feature pos gene_1 gene_2 gene_3 gene_n
       A   1      6      2     51      0
       A   2      4      5      8      2
       A   3      1     74      5      0
       B   1     11      2     41     89
       B   2      4      5      3      5

Output file

 feature pos gene_1 gene_2 gene_3 gene_n  sum_all  average_all
       A   1   6/87   2/87  51/87   0/87 sum_row1 average_row1
       A   2   4/87   5/87   8/87   2/87 sum_row2 average_row2
       A   3   1/87  74/87   5/87   0/87 sum_row3 average_row3
       B   1  11/54   2/54  41/54  89/54 sum_row4 average_row4
       B   2   4/54   5/54   3/54   5/54 sum_row5 average_row5
       B   3   4/54   0/54   5/54  21/54 sum_row6 average_row6
like image 989
NamAshena Avatar asked Mar 12 '23 10:03

NamAshena


2 Answers

This might be made easier by merging your divisors in to your main dataset:

feat_div <- data.frame(feature=c("A","B"), value=c(87,54))
#  feature value
#1       A    87
#2       B    54

cols <- grepl("^gene_", names(dat))

dat <- merge(dat, feat_div)

dat[cols]       <- lapply(dat[cols], `/`, dat$value)
dat$sum_all     <- rowSums(dat[cols])
dat$average_all <- rowMeans(dat[cols])

#  feature pos     gene_1     gene_2     gene_3     gene_n value   sum_all average_all
#1       A   1 0.06896552 0.02298851 0.58620690 0.00000000    87 0.6781609   0.1695402
#2       A   2 0.04597701 0.05747126 0.09195402 0.02298851    87 0.2183908   0.0545977
#3       A   3 0.01149425 0.85057471 0.05747126 0.00000000    87 0.9195402   0.2298851
#4       B   1 0.20370370 0.03703704 0.75925926 1.64814815    54 2.6481481   0.6620370
#5       B   2 0.07407407 0.09259259 0.05555556 0.09259259    54 0.3148148   0.0787037
like image 187
thelatemail Avatar answered Mar 14 '23 22:03

thelatemail


dplyr can do what you want, if you use a lookup table or the like to index what number to divide by:

library(dplyr)

# make a lookup vector
feat_num <- c(A = 87, B = 54)

feat_num
##
##  A  B 
## 87 54 

# group by feature and pos so they don't get divided
df %>% group_by(feature, pos) %>%    
    # divide everything but grouping variables (.) by the number looked up from feat_num
    mutate_each(funs(. / feat_num[feature])) %>%    
    # ungroup so next mutate works nicely
    ungroup() %>%    
    # add row sum and mean columns, indexing out the first and second columns
    mutate(sum_all = rowSums(.[-1:-2]), 
           average_all = rowMeans(.[-1:-2]))
##
## Source: local data frame [5 x 8]
## 
##   feature   pos     gene_1     gene_2     gene_3     gene_n   sum_all average_all
##    (fctr) (int)      (dbl)      (dbl)      (dbl)      (dbl)     (dbl)       (dbl)
## 1       A     1 0.06896552 0.02298851 0.58620690 0.00000000 0.6781609   0.1695402
## 2       A     2 0.04597701 0.05747126 0.09195402 0.02298851 0.2183908   0.0545977
## 3       A     3 0.01149425 0.85057471 0.05747126 0.00000000 0.9195402   0.2298851
## 4       B     1 0.20370370 0.03703704 0.75925926 1.64814815 2.6481481   0.6620370
## 5       B     2 0.07407407 0.09259259 0.05555556 0.09259259 0.3148148   0.0787037
like image 34
alistaire Avatar answered Mar 14 '23 22:03

alistaire