Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill NAs based on a grouped variable?

Tags:

r

My test data:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    NA       NA
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020

I have some records of a same person (in this case david), which have both valid (100 and 01/01/2020) and NA data in weight and weightdate. Im looking to normalize the NA by replacing them with valid data from that person. For the case of anne who doesnt have any valid data, i would leave it as is. I was thinking about using fill.

My desired output:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    100      01/01/2020
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020
like image 698
Andres Mora Avatar asked Jan 25 '23 04:01

Andres Mora


2 Answers

You can use fill() function from tidyr package:

library(tidyr)

df1 %>% group_by(name) %>% fill(weight, weightdate) %>% ungroup
# A tibble: 6 x 5
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

Data

df1 <- structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
    name = c("david", "david", "david", "anne", "peter", "peter"
    ), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
    "01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))
like image 134
Andre Wildberg Avatar answered Feb 01 '23 16:02

Andre Wildberg


This is an alternative. Andre Wildberg already provided my favorite:

library(dplyr)
df %>% 
  group_by(name) %>% 
  mutate(across(everything(), ~ifelse(is.na(.), lag(.),.)))
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

data:

structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
name = c("david", "david", "david", "anne", "peter", "peter"
), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
"01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))
like image 24
TarJae Avatar answered Feb 01 '23 17:02

TarJae