Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace NA's in numerical columns with the median of those columns? [duplicate]

I am working on a data frame with multiple data types.I would like to replace NA values only on numerical columns with the median of that particular column.I have seen questions on replacing with mean a lot, but not median. My df is similar to the following code:

my_groups <- c(rep("A", 5), rep("B",5))
my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(3, NA, 4, 8, 2, 11, 15, NA, 9, 10)
my_df <- data.frame(my_groups, my_values_1, my_values_2)
my_df %>% select_if(is.numeric)

This gives me numerical columns, but I cant figure out the next step.

like image 283
Tommy Avatar asked Jan 21 '26 22:01

Tommy


1 Answers

Here are several approaches. The test data frame DF is defined in (1) and used in the other approaches as well.

1) dplyr - across/coalesce

library(dplyr)

# test data
DF <- data.frame(a = c(NA, NA, 1, 2), b = 1:4, c = letters[1:4])

DF %>% 
  mutate(across(where(is.numeric), ~ coalesce(., median(., na.rm = TRUE))))

giving:

    a b c
1 1.5 1 a
2 1.5 2 b
3 1.0 3 c
4 2.0 4 d

2) dplyr/tidyr - across/replace_na

library(dplyr)
library(tidyr)

DF %>% 
  mutate(across(where(is.numeric), ~ replace_na(., median(., na.rm = TRUE))))

3) zoo - na.aggregate

library(zoo)

ok <- sapply(DF, is.numeric)
replace(DF, ok, na.aggregate(DF[ok], FUN = median))

4) Base R

na.median <- function(x) replace(x, is.na(x), median(x, na.rm = TRUE))   
ok <- sapply(DF, is.numeric)
replace(DF, ok, lapply(DF[ok], na.median))

5) Base R - S3

na.median <- function(x, ...) UseMethod("na.median")
na.median.default <- identity
na.median.numeric <- function(x, ...) {
  replace(x, is.na(x), median(x, na.rm = TRUE))   
}

replace(DF, TRUE, lapply(DF, na.median))

6) magrittr We first make a copy of DF to avoid clobbering it -- although not recommend you can just use DF in last line if you are ok with overwriting it -- and use magrittr %<>%. na.median is from (4).

library(magrittr)

DF2 <- DF
DF2[sapply(DF2, is.numeric)] %<>% lapply(na.median)

7) collapse - ftmv ftmv or its synonym ftransformv provide a compact expression. This uses na.median is from (4).

library(collapse)

tfmv(DF, is.numeric, na.median)
like image 161
G. Grothendieck Avatar answered Jan 23 '26 10:01

G. Grothendieck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!