Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill missing values with previous values by row using dplyr

Tags:

r

dplyr

I am working with a data frame in R which has some missing values across rows. Data frame is next (dput added in the end):

df
  id V1 V2 V3 V4
1 01  1  1  1 NA
2 02  2  1 NA NA
3 03  3  1 NA NA
4 04  4  1  2 NA

Each row is a different id. As you can see the rows have missing values. I would like to know how can I get a dataframe completed in this style without using reshape to long or pivot as my real data is very large:

df
  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

I was trying to use fill from tidyr but at row level I am having issues. I have seen some posts where it is used along with dplyr function across but I can not find it. I have tried using group_by(id) and rowwise but I have not had success. Also only the variables/columns starting with V should be filled with previous values.

Data is next:

#Data
df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3, 
4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA, 
NA, NA)), class = "data.frame", row.names = c(NA, -4L))

Many thanks for your time.

like image 602
Duck Avatar asked Jun 16 '21 13:06

Duck


Video Answer


4 Answers

One solution could be using na.locf function from package zoo combining with pmap function in a row-wise operation. na.locf takes the most recent non-NA value and replace all the upcoming NA values by that. Just as a reminder c(...) in both solutions captures all values of V1:V4 in each row in every iteration. However, I excluded id column in both as it is not involved in the our calculations.

library(zoo)
library(purrr)

df %>%
  mutate(pmap_df(., ~ na.locf(c(...)[-1])))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

Or we can use coalesce function from dplyr. We can replace every NA values in each row with the last non-NA value, something we did earlier with na.locf. However this solution is a bit verbose:

df %>%
  mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))]; 
  coalesce(c(...), x[length(x)])}))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

Or you could also use this:

library(purrr)

df %>%
  mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

The warning message can be ignored. It is in fact produced because we have 6 NA values but the result of applying dplyr::coalesce on every vector is 1 element resulting in 4 elements to replace 6 slots.

like image 68
Anoushiravan R Avatar answered Oct 25 '22 19:10

Anoushiravan R


One option using dplyr could be:

df %>%
 mutate(across(-id, ~ ifelse(is.na(.), coalesce(!!!select(., V4:V1)), .)))

  id V1 V2 V3 V4
1  1  1  1  1  1
2  2  2  1  1  1
3  3  3  1  1  1
4  4  4  1  2  2
like image 27
tmfmnk Avatar answered Oct 25 '22 20:10

tmfmnk


A dplyr approach

df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3, 
                                                            4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA, 
                                                                                                                 NA, NA)), class = "data.frame", row.names = c(NA, -4L))


library(dplyr, warn.conflicts = F)

df %>% mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[-1][!is.na(cur_data()[-1])],1))))
#>   id V1 V2 V3 V4
#> 1 01  1  1  1  2
#> 2 02  2  1  2  2
#> 3 03  3  1  2  2
#> 4 04  4  1  2  2

If you'll group_by on id column, you won't have to use [-1] on cur_data()`

df %>% group_by(id) %>%
  mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[!is.na(cur_data())],1))))

like image 23
AnilGoyal Avatar answered Oct 25 '22 20:10

AnilGoyal


A data.table option with nafill

> setDT(df)[, setNames(as.list(nafill(unlist(.SD), type = "locf")), names(.SD)), id]
   id V1 V2 V3 V4
1: 01  1  1  1  1
2: 02  2  1  1  1
3: 03  3  1  1  1
4: 04  4  1  2  2
like image 34
ThomasIsCoding Avatar answered Oct 25 '22 19:10

ThomasIsCoding