I have Price data indexed according to three things:
State, Date, and UPC (that is the Product Code).
I have a bunch of prices that are NA.
I am trying to fill the NAs in in the following way: For a given missing Price with index (S,D,UPC), fill in with the average Price of all the data points with the same S and UPC. I.e., take the average over Date.
There must be an incredibly easy way to do this because this is very simple. I have been using for loops, but I now realize that that is incredibly inefficient and I would like to use a function, such as one in plyr or dplyr, that will do it all in as few steps as possible.
upc=c(1153801013,1153801013,1153801013,1153801013,1153801013,1153801013,2105900750,2105900750,2105900750,2105900750,2105900750,2173300001,2173300001,2173300001,2173300001)
date=c(200601,200602,200603,200604,200601,200602,200601,200602,200603,200601,200602,200603,200604,200605,200606)
price=c(26,28,NA,NA,23,24,85,84,NA,81,78,24,19,98,NA)
state=c(1,1,1,1,2,2,1,1,2,2,2,1,1,1,1)
# This is what I have:
data <- data.frame(upc,date,state,price)
# This is what I want:
price=c(26,28,27,27,23,24,85,84,79.5,81,78,24,19,98,47)
data2 <- data.frame(upc,date,state,price)
Any advice? Thanks.
Use ave
with multiple grouping variables, and then replace NA
values with the means:
with(data,
ave(price, list(upc,state), FUN=function(x) replace(x,is.na(x),mean(x,na.rm=TRUE) ) )
)
# [1] 26.0 28.0 27.0 27.0 23.0 24.0 85.0 84.0 79.5 81.0 78.0 24.0 19.0 98.0 47.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With