Function to impute missing value [duplicate]

Question

I have a dataframe that looks like this:

set.seed(300)
df <- data.frame(site = sort(rep(paste0("site", 1:5), 5)), 
                 value = sample(c(1:5, NA), replace = T, 25))

df 

    site value
1  site1    NA
2  site1     5
3  site1     5
4  site1     5
5  site1     5
6  site2     1
7  site2     5
8  site2     3
9  site2     3
10 site2    NA
11 site3    NA
12 site3     2
13 site3     5
14 site3     4
15 site3     4
16 site4    NA
17 site4    NA
18 site4     4
19 site4     4
20 site4     4
21 site5    NA
22 site5     3
23 site5     3
24 site5     1
25 site5     1

As you can see, there are several missing values in the valuecolumn. I need to replace missing values in the valuecolumn with the mean for a site. So if there is a missing value for value measured at site1, I need to impute the mean value for site1. However, the dataframe is constantly being added to and imported into R, and the next time I import the dataframe it will likely have increased to something like 50 rows in length and there are likely to be many more missing values in value. I need to make a function that will automatically detect which site a missing value in value was measured at, and impute the missing value for that particular site. Could anybody help me with this?

nacnudus · Accepted Answer

Using impute() from package Hmisc and ddply from package plyr:

require(plyr)
require(Hmisc)

df2 <- ddply(df, "site", mutate, imputed.value = impute(value, mean))

Max Candocia · Answer

First, you can get the different levels of the sites.

sites=levels(df$site)

You can then get the means of different levels

nlevels=length(sites)
meanlist=numeric(nlevels)
for (i in 1:nlevels)
    meanlist[i]=mean(df[df[,1]==sites[i],2],na.rm=TRUE)

Then you can fill in each of the NA values. There's probably a faster way, but as long as your set isn't huge, you can do it with for loops.

for (i in 1:dim(df)[1])
    if (is.na(df[i,2]))
         df[i,2]=meanlist[which(sites==df[i,1])]

Hope this helps.

Function to impute missing value [duplicate]

Tags:

r

missing-data

luciano

2 Answers

nacnudus

Max Candocia

Recent Activity

Donate For Us

Function to impute missing value [duplicate]

Tags:

r

missing-data

luciano

2 Answers

nacnudus

Max Candocia

Related questions

Recent Activity

Donate For Us