Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NA with values in another row of same column for each group in r

Tags:

r

I need to replace the NA's of each row with non NA's values of different row for a given column for each group

let say sample data like:

id   name
 1     a
 1     NA
 2     b
 3     NA
 3     c
 3     NA

desired output:

id   name
 1     a
 1     a
 2     b
 3     c
 3     c
 3     c

Is there a way to perform this in r ?

like image 417
Dheeraj Singh Avatar asked Aug 07 '15 13:08

Dheeraj Singh


People also ask

How do I replace the NA with values in R?

You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.

How do I replace all NA values in a Dataframe in R?

You can replace NA values with blank space on columns of R dataframe (data. frame) by using is.na() , replace() methods. And use dplyr::mutate_if() to replace only on character columns when you have mixed numeric and character columns, use dplyr::mutate_at() to replace on multiple selected columns by index and name.


2 Answers

Here is an approach using dplyr. From the data frame x we group by id and replace NA with the relevant values. I am assuming one unique value of name per id.

x <- data.frame(id = c(1, 1, 2, rep(3,3)), 
 name = c("a", NA, "b", NA, "c", NA), stringsAsFactors=F)

require(dplyr)
x %>%
  group_by(id) %>%
  mutate(name = unique(name[!is.na(name)]))

Source: local data frame [6 x 2]
Groups: id

#  id name
#1  1    a
#2  1    a
#3  2    b
#4  3    c
#5  3    c
#6  3    c
like image 141
Whitebeard Avatar answered Nov 02 '22 06:11

Whitebeard


We can use data.table to do this. Convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'id', we replace the 'name' with the non-NA value in 'name'.

library(data.table)#v1.9.5+
setDT(df1)[, name:= name[!is.na(name)][1L] , by = id]
df1
#   id name
#1:  1    a
#2:  1    a
#3:  2    b
#4:  3    c
#5:  3    c
#6:  3    c

NOTE: Here I assumed that there is only a single unique non-NA value within each 'id' group.

Or another option would be to join the dataset with the unique rows of the data after we order by 'id' and 'name'.

 setDT(df1)
 df1[unique(df1[order(id, name)], by='id'), on='id', name:= i.name][]
 #   id name
 #1:  1    a
 #2:  1    a
 #3:  2    b
 #4:  3    c
 #5:  3    c
 #6:  3    c

NOTE: The on is only available with the devel version of data.table. Instructions to install the devel version are here

data

df1 <- structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L), name = c("a", 
NA, "b", NA, "c", NA)), .Names = c("id", "name"),
class = "data.frame",    row.names = c(NA, -6L))
like image 7
akrun Avatar answered Nov 02 '22 08:11

akrun