Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill 'NA's in data frame with information contained in one of the rows with a patient's ID using R

Tags:

r

I have the following data frame in R:

ID  Information
1    Yes
1    NA
1    NA
1    Yes
2    No
2    NA
2    NA
3    NA
3    NA
3    Maybe
3    NA

I need to fill out the rows that contain NA's with whatever information is contained in one of the rows corresponding to that ID. I would like to have this:

ID  Information
1   Yes
1   Yes
1   Yes
1   Yes
2   No
2   No
2   No
3   Maybe
3   Maybe
3   Maybe
3   Maybe

As far as I know, the information (ie Yes/No/Maybe) is not conflicting within an ID but it may be repeated.(Sorry about the ugly format- I am a newbie and may not post pictures).

Thank you!

like image 644
Bogs Avatar asked Jul 24 '15 13:07

Bogs


4 Answers

One option is using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we assign (:=) 'Information' as the unique non-NA element.

library(data.table)#v1.9.5+
setDT(df1)[, Information:=unique(Information[!is.na(Information)]), by = ID]
df1
#     ID Information
#  1:  1         Yes
#  2:  1         Yes
#  3:  1         Yes
#  4:  1         Yes
#  5:  2          No
#  6:  2          No
#  7:  2          No
#  8:  3       Maybe
#  9:  3       Maybe
# 10:  3       Maybe
# 11:  3       Maybe

Or we can join the dataset with the unique rows of dataset after removing the 'NA' rows. Here, I use the devel version of data.table

 setDT(unique(na.omit(df1)))[df1['ID'], on='ID'] 

Or we use dplyr, grouped by 'ID', we arrange the 'Information' so that 'NA' will be the last, create the 'Information' as the first value of 'Information'.

 library(dplyr)
 df1 %>%
    group_by(ID) %>% 
    arrange(Information) %>% 
    mutate(Information= first(Information))
like image 58
akrun Avatar answered Dec 08 '22 18:12

akrun


Here is an option using na.locf with ddply

library(zoo)
library(plyr)

ddply(d, .(ID), mutate, Information = na.locf(Information))

#   ID Information
#1   1         Yes
#2   1         Yes
#3   1         Yes
#4   1         Yes
#5   2          No
#6   2          No
#7   2          No
#8   3       Maybe
#9   3       Maybe
#10  3       Maybe
#11  3       Maybe
like image 25
Veerendra Gadekar Avatar answered Dec 08 '22 16:12

Veerendra Gadekar


Or in base R:

uniqueCombns <- unique(dat[complete.cases(dat),])
merge(dat["ID"], uniqueCombns, by="ID", all.x=T)

where dat is your dataframe

like image 24
Simon Mills Avatar answered Dec 08 '22 18:12

Simon Mills


Since DF$information is a valid "factor" and there are no conflictions, you could, also, do (unless I'm ignoring something):

levels(DF$Information)[approxfun(DF$ID, DF$Information, method = "constant")(DF$ID)]
# [1] "Yes"   "Yes"   "Yes"   "Yes"   "No"    "No"    "No"    "Maybe" "Maybe" "Maybe" "Maybe"
like image 41
alexis_laz Avatar answered Dec 08 '22 17:12

alexis_laz