I need to replace the NA's of each row with non NA's values of different row for a given column for each group
let say sample data like:
id name
1 a
1 NA
2 b
3 NA
3 c
3 NA
desired output:
id name
1 a
1 a
2 b
3 c
3 c
3 c
Is there a way to perform this in r ?
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods. And use dplyr::mutate_if() to replace only on character columns when you have mixed numeric and character columns, use dplyr::mutate_at() to replace on multiple selected columns by index and name.
Here is an approach using dplyr
. From the data frame x
we group by id
and replace NA
with the relevant values. I am assuming one unique value of name
per id
.
x <- data.frame(id = c(1, 1, 2, rep(3,3)),
name = c("a", NA, "b", NA, "c", NA), stringsAsFactors=F)
require(dplyr)
x %>%
group_by(id) %>%
mutate(name = unique(name[!is.na(name)]))
Source: local data frame [6 x 2]
Groups: id
# id name
#1 1 a
#2 1 a
#3 2 b
#4 3 c
#5 3 c
#6 3 c
We can use data.table
to do this. Convert the 'data.frame' to 'data.table' (setDT(df1)
). Grouped by 'id', we replace the 'name' with the non-NA value in 'name'.
library(data.table)#v1.9.5+
setDT(df1)[, name:= name[!is.na(name)][1L] , by = id]
df1
# id name
#1: 1 a
#2: 1 a
#3: 2 b
#4: 3 c
#5: 3 c
#6: 3 c
NOTE: Here I assumed that there is only a single unique non-NA value within each 'id' group.
Or another option would be to join the dataset with the unique
rows of the data after we order
by 'id' and 'name'.
setDT(df1)
df1[unique(df1[order(id, name)], by='id'), on='id', name:= i.name][]
# id name
#1: 1 a
#2: 1 a
#3: 2 b
#4: 3 c
#5: 3 c
#6: 3 c
NOTE: The on
is only available with the devel version of data.table
. Instructions to install the devel version are here
df1 <- structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L), name = c("a",
NA, "b", NA, "c", NA)), .Names = c("id", "name"),
class = "data.frame", row.names = c(NA, -6L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With