Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R fill missing values with the sum of other values using tidyverse

I have a data frame with many columns and many rows.

col_1 | col_2 | ... | col_n
 35   |  NA   | ... |   2
  .   |   .   |  .  |   .
  .   |   .   |  .  |   .
  .   |   .   |  .  |   .
 123  |  90   | ... |   NA

Some rows contain NA values (can be more than 1 NA).

I wish to find all rows that contain exactly 1 NA and replace it with the sum of the other columns.

How can I achieve it using tidyverse?

like image 818
Kevin Avatar asked Dec 14 '22 07:12

Kevin


1 Answers

I used toy data from Anil Goyal (Thanks!)

There was a similar question today please see here: R: Replace NA with other variables in the df using tidyverse

Here we:

  1. sum the rows
  2. count the NA's
  3. and with across with apply the desired condition on the col1-4
  4. and the most part I love is .keep = "unused" which removes the "helper" columns.
df %>% 
  mutate(rowsum1 = rowSums(., na.rm=TRUE)) %>%
  mutate(count_na = rowSums(is.na(select(.,everything())))) %>% 
  mutate(across(starts_with("col"), ~case_when(count_na ==1 ~coalesce(.,rowsum1),
                                               TRUE ~ as.numeric(.))
                ), .keep ="unused"
         )

Output:

 col_1 col_2 col_3 col_4
1    35   421  1223   767
2    43    54   435    78
3   234    NA    NA    65
4   784     8   687    89
5    23    45    78   146
like image 71
TarJae Avatar answered Feb 16 '23 01:02

TarJae