Divide each column sum by the sum of the matrix

Question

If I have a dataframe:

d = data.frame(sample=c("a2","a3"),a=c(1,5),b=c(4,5),c=c(6,4))
d
    sample a b c
1     a2 1 4 6
2     a3 5 5 4

How do I divide the sum of each column by the sum of the entire dataframe using dplyr so I end up with a dataframe that looks like:

     a b c
1    6/25 9/25 10/25

I tried to do

d <- d %>%
mutate_if(is.numeric, funs(colSums(d)/sum(d)))

but keeps returning erroring.

Thanks in advance!

G. Grothendieck · Accepted Answer

Except for 2a and 2b, in each of these alternatives we could replace the first two components of the pipeline with d[-1] if it is ok to assume that we know that only the first column is non-numeric.

1) Base R With base R we get a straight forward solution:

d |> Filter(f = is.numeric) |> colSums() |> prop.table()
##    a    b    c 
## 0.24 0.36 0.40

2) dplyr With dplyr:

library(dplyr)

d %>%
  select(where(is.numeric)) %>%
  summarize(across(.fn = sum) / sum(.))
##      a    b   c
## 1 0.24 0.36 0.4

2a) or

d %>%
  summarize(across(where(is.numeric), sum)) %>%
  { . / sum(.) }

2b) The scoped functions such as the *_if functions are not used these days having been superseded by across but they are still available so if you want to use them anyways then try this which is close to the code in the question:

d %>%
  summarize_if(is.numeric, sum) %>%
  { . / sum(.) }

3) collapse With the collapse package, get the numeric variables (nv), sum each column (fsum) and then take proportions. When I benchmarked it on this data it ran 3x faster than (1), over 100x faster than (2) and 300x faster than (4).

library(collapse)
d |> nv() |> fsum() |> fsum(TRA = "/")
##    a    b    c 
## 0.24 0.36 0.40

4) dplyr/tidyr With tidyr and dplyr we can convert to long form, process and convert back.

library(dplyr)
library(tidyr)
d %>%
  select(where(is.numeric)) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(value = sum(value) / sum(.$value), .groups = "drop") %>%
  pivot_wider
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1  0.24  0.36   0.4

TarJae · Answer

We could use colSums and the sum of colSums. -1 excludes column1 for calculation

result <- colSums(d[,-1])/sum(colSums(d[,-1]))
result

Output:

   a    b    c 
0.24 0.36 0.40

Divide each column sum by the sum of the matrix

Tags:

r

dplyr

Danby

2 Answers

G. Grothendieck

TarJae

Recent Activity

Donate For Us

Divide each column sum by the sum of the matrix

Tags:

r

dplyr

Danby

2 Answers

G. Grothendieck

TarJae

Related questions

Recent Activity

Donate For Us