Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Divide each column sum by the sum of the matrix

Tags:

r

dplyr

If I have a dataframe:

d = data.frame(sample=c("a2","a3"),a=c(1,5),b=c(4,5),c=c(6,4))
d
    sample a b c
1     a2 1 4 6
2     a3 5 5 4

How do I divide the sum of each column by the sum of the entire dataframe using dplyr so I end up with a dataframe that looks like:

     a b c
1    6/25 9/25 10/25

I tried to do

d <- d %>%
mutate_if(is.numeric, funs(colSums(d)/sum(d)))

but keeps returning erroring.

Thanks in advance!

like image 471
Danby Avatar asked Dec 09 '22 23:12

Danby


2 Answers

Except for 2a and 2b, in each of these alternatives we could replace the first two components of the pipeline with d[-1] if it is ok to assume that we know that only the first column is non-numeric.

1) Base R With base R we get a straight forward solution:

d |> Filter(f = is.numeric) |> colSums() |> prop.table()
##    a    b    c 
## 0.24 0.36 0.40 

2) dplyr With dplyr:

library(dplyr)

d %>%
  select(where(is.numeric)) %>%
  summarize(across(.fn = sum) / sum(.))
##      a    b   c
## 1 0.24 0.36 0.4

2a) or

d %>%
  summarize(across(where(is.numeric), sum)) %>%
  { . / sum(.) }

2b) The scoped functions such as the *_if functions are not used these days having been superseded by across but they are still available so if you want to use them anyways then try this which is close to the code in the question:

d %>%
  summarize_if(is.numeric, sum) %>%
  { . / sum(.) }

3) collapse With the collapse package, get the numeric variables (nv), sum each column (fsum) and then take proportions. When I benchmarked it on this data it ran 3x faster than (1), over 100x faster than (2) and 300x faster than (4).

library(collapse)
d |> nv() |> fsum() |> fsum(TRA = "/")
##    a    b    c 
## 0.24 0.36 0.40 

4) dplyr/tidyr With tidyr and dplyr we can convert to long form, process and convert back.

library(dplyr)
library(tidyr)
d %>%
  select(where(is.numeric)) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(value = sum(value) / sum(.$value), .groups = "drop") %>%
  pivot_wider
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1  0.24  0.36   0.4
like image 169
G. Grothendieck Avatar answered Jan 04 '23 02:01

G. Grothendieck


We could use colSums and the sum of colSums. -1 excludes column1 for calculation

result <- colSums(d[,-1])/sum(colSums(d[,-1]))
result

Output:

   a    b    c 
0.24 0.36 0.40 
like image 23
TarJae Avatar answered Jan 04 '23 03:01

TarJae