I am working with a data frame in R
which has some missing values across rows. Data frame is next (dput
added in the end):
df
id V1 V2 V3 V4
1 01 1 1 1 NA
2 02 2 1 NA NA
3 03 3 1 NA NA
4 04 4 1 2 NA
Each row is a different id
. As you can see the rows have missing values. I would like to know how can I get a dataframe completed in this style without using reshape to long or pivot as my real data is very large:
df
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
I was trying to use fill
from tidyr
but at row level I am having issues. I have seen some posts where it is used along with dplyr
function across
but I can not find it. I have tried using group_by(id)
and rowwise
but I have not had success. Also only the variables/columns starting with V
should be filled with previous values.
Data is next:
#Data
df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3,
4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA,
NA, NA)), class = "data.frame", row.names = c(NA, -4L))
Many thanks for your time.
One solution could be using na.locf
function from package zoo
combining with pmap
function in a row-wise operation. na.locf
takes the most recent non-NA
value and replace all the upcoming NA
values by that. Just as a reminder c(...)
in both solutions captures all values of V1:V4
in each row in every iteration. However, I excluded id
column in both as it is not involved in the our calculations.
library(zoo)
library(purrr)
df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or we can use coalesce
function from dplyr
. We can replace every NA
values in each row with the last non-NA
value, something we did earlier with na.locf
. However this solution is a bit verbose:
df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or you could also use this:
library(purrr)
df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
The warning message can be ignored. It is in fact produced because we have 6 NA
values but the result of applying dplyr::coalesce
on every vector is 1 element resulting in 4 elements to replace 6 slots.
One option using dplyr
could be:
df %>%
mutate(across(-id, ~ ifelse(is.na(.), coalesce(!!!select(., V4:V1)), .)))
id V1 V2 V3 V4
1 1 1 1 1 1
2 2 2 1 1 1
3 3 3 1 1 1
4 4 4 1 2 2
A dplyr
approach
df <- structure(list(id = c("01", "02", "03", "04"), V1 = c(1, 2, 3,
4), V2 = c(1, 1, 1, 1), V3 = c(1, NA, NA, 2), V4 = c(NA, NA,
NA, NA)), class = "data.frame", row.names = c(NA, -4L))
library(dplyr, warn.conflicts = F)
df %>% mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[-1][!is.na(cur_data()[-1])],1))))
#> id V1 V2 V3 V4
#> 1 01 1 1 1 2
#> 2 02 2 1 2 2
#> 3 03 3 1 2 2
#> 4 04 4 1 2 2
If you'll group_by
on id
column, you won't have to use [-1] on
cur_data()`
df %>% group_by(id) %>%
mutate(across(V1:V4, ~ coalesce(., tail(cur_data()[!is.na(cur_data())],1))))
A data.table
option with nafill
> setDT(df)[, setNames(as.list(nafill(unlist(.SD), type = "locf")), names(.SD)), id]
id V1 V2 V3 V4
1: 01 1 1 1 1
2: 02 2 1 1 1
3: 03 3 1 1 1
4: 04 4 1 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With