Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace NA value with the group value

Tags:

r

na

I have a df as follows which has 20 people across 5 households. Some people within the household have missing data for whether they have a med_card or not. I want to give these people the same value as the other people in their household (not an NA value, a real binary value which is either 0 or 1).

I have tried the following code, which is a step in the right direction I think - but isn't 100% correct because a) it doesn't work if the first value for med_card per household is NA and b) it doesn't replace NA for all people in household 1.

DF<- ddply(df, .(hhold_no), function(df) {df$med_card[is.na(df$med_card)] <- head(df$med_card, na.rm=TRUE); return(df)})

Any pointers would be greatly appreciated, Thank you

Sample df

df
   person_id hhold_no med_card
1          1        1        1
2          2        1        1
3          3        1       NA
4          4        1       NA
5          5        1       NA
6          6        2        0
7          7        2        0
8          8        2        0
9          9        2        0
10        10        3       NA
11        11        3       NA
12        12        3       NA
13        13        3        1
14        14        3        1
15        15        4        1
16        16        4        1
17        17        5        1
18        18        5        1
19        19        5       NA
20        20        5       NA

and code to make

person_id<-as.numeric(c(1:20))
hhold_no<-as.numeric(c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5,5,5))
med_card<-as.numeric(c(1,1,NA,NA,NA,0,0,0,0,NA,NA,NA,1,1,1,1,1,1,NA,NA))
df<-data.frame(person_id,hhold_no, med_card)

Desired output

df
   person_id hhold_no med_card med_card_new
1          1        1        1            1
2          2        1        1            1
3          3        1       NA            1
4          4        1       NA            1
5          5        1       NA            1
6          6        2        0            0
7          7        2        0            0
8          8        2        0            0
9          9        2        0            0
10        10        3       NA            1
11        11        3       NA            1
12        12        3       NA            1
13        13        3        1            1
14        14        3        1            1
15        15        4        1            1
16        16        4        1            1
17        17        5        1            1
18        18        5        1            1
19        19        5       NA            1
20        20        5       NA            1
like image 973
user2363642 Avatar asked May 10 '14 16:05

user2363642


People also ask

How to replace Na with specified values in R?

To replace NA with specified values in R, use the replace_na () function. The replace_na () function replaces NAs with specified values. We can replace it with 0 or any other value of our choice. replace_na (data, replace, ...) data: It is a data frame or Vector. replace: If the data is a Vector, the replace takes a single value.

How do you replace Na in SQL?

replace_na ( data, replace, ...) A data frame or vector. If data is a data frame, a named list giving the value to replace NA with for each column. If data is a vector, a single value used for replacement. ... Additional arguments for methods.

How to replace Na values with specified values in AutoCAD?

Replace NAs with specified values 1 Arguments. A data frame or vector. If data is a data frame, replace takes a list of values, with one value for each column that has NA values to be ... 2 Value. If data is a data frame, replace_na () returns a data frame. ... 3 See also 4 Examples

How do I replace Na with specified values in plyr?

The dplyr package is the next iteration of plyr, focus on tools for working with data frames. The key object in dplyr is a tbl, a representation of a tabular data structure. To replace NA with specified values in R, use the replace_na () function. The replace_na () function replaces NAs with specified values.


4 Answers

Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.:

df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))

#   person_id hhold_no med_card med_card_new
#1          1        1        1            1
#2          2        1        1            1
#3          3        1       NA            1
#4          4        1       NA            1
#5          5        1       NA            1
#6          6        2        0            0
#7          7        2        0            0
#8          8        2        0            0
#9          9        2        0            0

Please note that this will only work if not all values in a household are NA and the should not differ (e.g. person 1 == 1, person 2 == 0).

like image 112
sgibb Avatar answered Oct 02 '22 16:10

sgibb


data.table solution

library(data.table)
setDT(df)[, med_card2 := unique(med_card[!is.na(med_card)]), by = hhold_no]

#     person_id hhold_no med_card med_card2
#  1:         1        1        1         1
#  2:         2        1        1         1
#  3:         3        1       NA         1
#  4:         4        1       NA         1
#  5:         5        1       NA         1
#  6:         6        2        0         0
#  7:         7        2        0         0
#  8:         8        2        0         0
#  9:         9        2        0         0
# 10:        10        3       NA         1
# 11:        11        3       NA         1
# 12:        12        3       NA         1
# 13:        13        3        1         1
# 14:        14        3        1         1
# 15:        15        4        1         1
# 16:        16        4        1         1
# 17:        17        5        1         1
# 18:        18        5        1         1
# 19:        19        5       NA         1
# 20:        20        5       NA         1
like image 24
David Arenburg Avatar answered Oct 02 '22 17:10

David Arenburg


That is exactly what na.aggregate (link) in the zoo package does:

library(zoo)

transform(df, med_card_new = na.aggregate(med_card, by = hhold_no))

This uses mean; however, you can specify any function you like. For example, if you prefer to return an NA if all items in a group are NA (rather than NaN which is what mean would return if given a zero length vector) then

meanNA <- function(x, ...) if (all(is.na(x))) NA else mean(x, ...)
transform(df, med_card_new = na.aggregate(med_card, by = hhold_no, FUN = meanNA))
like image 20
G. Grothendieck Avatar answered Oct 02 '22 17:10

G. Grothendieck


Using dplyr you could also group_by() and then take advantage of a function such as max with an na.rm argument to return all numerics for each group.

library(dplyr)
df %>% group_by(hhold_no) %>% mutate(med_card_new = max(med_card, na.rm = T))

Given that non-missings in a group are numeric and constant, you could also use mean or min instead of max.

like image 45
Joe Avatar answered Oct 02 '22 16:10

Joe