Suppose I have the following data frame:
Base Coupled Derived Decl
1 0 0 1
1 7 0 1
1 1 0 1
2 3 12 1
1 0 4 1
Here is the dput output:
temp <- structure(list(Base = c(1L, 1L, 1L, 2L, 1L), Coupled = c(0L,7L, 1L, 3L, 0L), Derived = c(0L, 0L, 0L, 12L, 4L), Decl = c(1L, 1L, 1L, 1L, 1L)), .Names = c("Base", "Coupled", "Derived", "Decl"), row.names = c(NA, 5L), class = "data.frame")
I want to compute the median for each column. Then, for each row, I want to count the number of cell values greater than the median for their respective columns and append this as a column called AboveMedians.
In the example, the medians would be c(1,1,0,1). The resulting table I want would be
Base Coupled Derived Decl AboveMedians
1 0 0 1 0
1 7 0 1 1
1 1 0 1 0
2 3 12 1 3
1 0 4 1 1
What is the elegant R way to do this? I have something involving a for-loop and sapply, but this doesn't seem optimal.
Thanks.
We can use rowMedians from matrixStats after converting the data.frame to matrix.
library(matrixStats)
Medians <- colMedians(as.matrix(temp))
Medians
#[1] 1 1 0 1
Then, replicate the 'Medians' to make the dimensions equal to that of 'temp', do the comparison and get the rowSums on the logical matrix.
temp$AboveMedians <- rowSums(temp >Medians[col(temp)])
temp$AboveMedians
#[1] 0 1 0 3 1
Or a base R only option is
apply(temp, 2, median)
# Base Coupled Derived Decl
# 1 1 0 1
rowSums(sweep(temp, 2, apply(temp, 2, median), FUN = ">"))
Another alternative:
library(dplyr)
library(purrr)
temp %>%
by_row(function(x) {
sum(x > summarise_each(., funs(median))) },
.to = "AboveMedian",
.collate = "cols"
)
Which gives:
#Source: local data frame [5 x 5]
#
# Base Coupled Derived Decl AboveMedian
# <int> <int> <int> <int> <int>
#1 1 0 0 1 0
#2 1 7 0 1 1
#3 1 1 0 1 0
#4 2 3 12 1 3
#5 1 0 4 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With