Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling in a ton of NA data in R by indices?

Tags:

r

missing-data

na

I have Price data indexed according to three things:

State, Date, and UPC (that is the Product Code).

I have a bunch of prices that are NA.

I am trying to fill the NAs in in the following way: For a given missing Price with index (S,D,UPC), fill in with the average Price of all the data points with the same S and UPC. I.e., take the average over Date.

There must be an incredibly easy way to do this because this is very simple. I have been using for loops, but I now realize that that is incredibly inefficient and I would like to use a function, such as one in plyr or dplyr, that will do it all in as few steps as possible.

upc=c(1153801013,1153801013,1153801013,1153801013,1153801013,1153801013,2105900750,2105900750,2105900750,2105900750,2105900750,2173300001,2173300001,2173300001,2173300001)
date=c(200601,200602,200603,200604,200601,200602,200601,200602,200603,200601,200602,200603,200604,200605,200606)
price=c(26,28,NA,NA,23,24,85,84,NA,81,78,24,19,98,NA)
state=c(1,1,1,1,2,2,1,1,2,2,2,1,1,1,1)

# This is what I have:
data <- data.frame(upc,date,state,price)

# This is what I want:
price=c(26,28,27,27,23,24,85,84,79.5,81,78,24,19,98,47)
data2 <- data.frame(upc,date,state,price)

Any advice? Thanks.

like image 436
ejn Avatar asked Dec 02 '22 13:12

ejn


1 Answers

Use ave with multiple grouping variables, and then replace NA values with the means:

with(data,
  ave(price, list(upc,state), FUN=function(x) replace(x,is.na(x),mean(x,na.rm=TRUE) ) )
)
# [1] 26.0 28.0 27.0 27.0 23.0 24.0 85.0 84.0 79.5 81.0 78.0 24.0 19.0 98.0 47.0
like image 50
thelatemail Avatar answered Dec 10 '22 09:12

thelatemail