Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R How to take Median of Rows in Dataframe

Tags:

r

median

I am wondering if there is any way to a median of the rows in a data frame. I understand the function rowmeans exists, but I do not believe there is a row median function. I would like to store the results in a new column in the dataframe. Here is my example

I tried to look online. There was one mention of row medians, but I could not find the function in R.

 C1<-c(3,2,4,4,5)
   C2<-c(3,7,3,4,5)
   C3<-c(5,4,3,6,3)
   DF <- data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)

   DF 


  # This is as far as I have gotten, but not streamlined

  MA <- median(C(3, 3, 5). na.rm = T)   # A
  MB <- median(C(2, 7, 4). na.rm = T)   # B
  MC <- median(C(4, 3, 3). na.rm = T)   # C
  MD <- median(C(4, 4, 6). na.rm = T)   # 4
  ME <- median(C(5, 5, 3). na.rm = T)   # E

  CM <- c(MA, MB, MC, MD, ME)C1<-c(3,2,4,4,5)


   ID C1 C2 C3
  1  A  3  3  5
  2  B  2  7  4
  3  C  4  3  3
  4  D  4  4  6
  5  E  5  5  3

   ID C1 C2 C3  CM
  1  A  3  3  5
  2  B  2  7  4
  3  C  4  3  3
  4  D  4  4  6
  5  E  5  5  3

Is there anyway I can streamline the process so it would be like DF$CM <- median(...

like image 624
Nick Benelli Avatar asked Jan 25 '19 13:01

Nick Benelli


3 Answers

To calculate the median of df, you can do the following

df$median = apply(df, 1, median, na.rm=T)
like image 150
Jiaqi Avatar answered Nov 17 '22 06:11

Jiaqi


If you would like to use dplyr, you can find an example here, especially mpalanco's answer. Briefly, after using rowwise to indicate that the operation should be applied by row (rather than to the entire data frame, as by default), you can use mutate to calculate and name a new column off of a selection of existing columns. Check out the documentation on each of those functions for more details.

E.g.,

library(dplyr)

DF %>% 
  rowwise() %>% 
  mutate(CM = median(c(C1, C2, C3), na.rm = TRUE))

will yield the output:

# A tibble: 5 x 5
  ID       C1    C2    C3    CM
  <fct> <dbl> <dbl> <dbl> <dbl>
1 A         3     3     5     3
2 B         2     7     4     4
3 C         4     3     3     3
4 D         4     4     6     4
5 E         5     5     3     5
like image 37
Michael Hutson Avatar answered Nov 17 '22 06:11

Michael Hutson


Just a little bit more flexible and up to date. We use c_across with rowwise function and it allows to use tidy-select semantics. Here we choose where to specify we only want the numeric column to calculate the median.

library(dplyr)

DF %>%
  rowwise() %>%
  mutate(med = median(c_across(where(is.numeric)), na.rm = TRUE))

# A tibble: 5 x 5
# Rowwise: 
  ID       C1    C2    C3   med
  <chr> <dbl> <dbl> <dbl> <dbl>
1 A         3     3     5     3
2 B         2     7     4     4
3 C         4     3     3     3
4 D         4     4     6     4
5 E         5     5     3     5
like image 4
Anoushiravan R Avatar answered Nov 17 '22 06:11

Anoushiravan R