I am working with a data frame in R which has some missing values across rows. Data frame is next (dput added in the end):
df
  id V1 V2 V3 V4
1 01  1  1  1 NA
2 02  2  1 NA NA
3 03  3  1 NA NA
4 04  4  1  2 NA
Each row is a different id. As you can see the rows have missing values. I would like to know how can I get a dataframe completed in this style without using reshape to long or pivot as my real data is very large:
df
  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2
I was trying to use fill from tidyr but at row level I am having issues. I have seen some posts where it is used along with dplyr function across but I can not find it. I have tried using group_by(id) and rowwise but I have not had success. Also only the variables/columns starting with V should be filled with previous values.
Data is next:
#Data
df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3, 
4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA, 
NA, NA)), class = "data.frame", row.names = c(NA, -4L))
Many thanks for your time.
One solution could be using na.locf function from package zoo combining with pmap function in a row-wise operation. na.locf takes the most recent non-NA value and replace all the upcoming NA values by that. Just as a reminder c(...) in both solutions captures all values of V1:V4 in each row in every iteration. However, I excluded id column in both as it is not involved in the our calculations.
library(zoo)
library(purrr)
df %>%
  mutate(pmap_df(., ~ na.locf(c(...)[-1])))
  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2
Or we can use coalesce function from dplyr. We can replace every NA values in each row with the last non-NA value, something we did earlier with na.locf. However this solution is a bit verbose:
df %>%
  mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))]; 
  coalesce(c(...), x[length(x)])}))
  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2
Or you could also use this:
library(purrr)
df %>%
  mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))
  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2
The warning message can be ignored. It is in fact produced because we have 6 NA values but the result of applying dplyr::coalesce on every vector is 1 element resulting in 4 elements to replace 6 slots.
One option using dplyr could be:
df %>%
 mutate(across(-id, ~ ifelse(is.na(.), coalesce(!!!select(., V4:V1)), .)))
  id V1 V2 V3 V4
1  1  1  1  1  1
2  2  2  1  1  1
3  3  3  1  1  1
4  4  4  1  2  2
                        A dplyr approach
df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3, 
                                                            4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA, 
                                                                                                                 NA, NA)), class = "data.frame", row.names = c(NA, -4L))
library(dplyr, warn.conflicts = F)
df %>% mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[-1][!is.na(cur_data()[-1])],1))))
#>   id V1 V2 V3 V4
#> 1 01  1  1  1  2
#> 2 02  2  1  2  2
#> 3 03  3  1  2  2
#> 4 04  4  1  2  2
If you'll group_by on id column, you won't have to use [-1] on cur_data()`
df %>% group_by(id) %>%
  mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[!is.na(cur_data())],1))))
                        A data.table option with nafill
> setDT(df)[, setNames(as.list(nafill(unlist(.SD), type = "locf")), names(.SD)), id]
   id V1 V2 V3 V4
1: 01  1  1  1  1
2: 02  2  1  1  1
3: 03  3  1  1  1
4: 04  4  1  2  2
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With