Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ave in R without NA values?

Tags:

r

I have a humongous data frame. It looks something like this:

> b
       fips      vix
1400  08005 18.58862
1401  47155 10.93712
1402  51191 10.93712
1403  47059 10.93712
1404  08005 10.93712
1405  08059 10.93712
1406  47063 10.93712
1407  37021 10.93712
1408  08031 10.93712
1409  45083 10.93712
1410  37089 10.93712
1411  37113 10.93712
1412  13207 10.93712
1413  08041 10.93712
1414  47093 21.50425
1415  08031 21.50425
1416  37009 21.50425
1417  36103 21.50425
1418  08035 21.50425
1419  08031 53.58363
1420  08035 53.58363
1421  08013 53.58363
1422  55105 21.17450
1423  08001 21.17450
1424  08031 21.17450
1425  47179 21.17450
1426  08059 21.17450
1427  37009 17.35675
1428  08041 17.35675
1429  08031 17.35675
1430  08005 17.35675
1431  08001       NA
1432  08031       NA
1433  47059       NA
1434  47145       NA
1435  13207       NA
1436  37021       NA
1437  37113       NA
1438  37089       NA

I took out some of the columns for simplicity sake and have only shown a fraction of the rows. I am trying to change the Vix column. What I am trying to do is this:

b$vix <- b$vix - ave(b$vix,b$fips)

What that SHOULD do is subtract the group means from each value of Vix. For example, for observation 1400, I want to take the average of all the observations that have fips==08005 and then do 18.58862 minus that average. However, the problem is that there are NA values. I want the average function to IGNORE the NA values. Instead, what happens is that any group of fips code that has one NA turns up as NA:

> b$vix <- b$vix - ave(b$vix,b$fips)
> b
       fips        vix
1400  08005   2.961125
1401  47155   0.000000
1402  51191   0.000000
1403  47059         NA
1404  08005  -4.690375
1405  08059  -5.118688
1406  47063   0.000000
1407  37021         NA
1408  08031         NA
1409  45083   0.000000
1410  37089         NA
1411  37113         NA
1412  13207         NA
1413  08041  -3.209812
1414  47093   0.000000
1415  08031         NA
1416  37009   2.073750
1417  36103   0.000000
1418  08035 -16.039688
1419  08031         NA
1420  08035  16.039688
1421  08013   0.000000
1422  55105   0.000000
1423  08001         NA
1424  08031         NA
1425  47179   0.000000
1426  08059   5.118688
1427  37009  -2.073750
1428  08041   3.209812
1429  08031         NA
1430  08005   1.729250
1431  08001         NA
1432  08031         NA
1433  47059         NA
1434  47145         NA
1435  13207         NA
1436  37021         NA
1437  37113         NA
1438  37089         NA

As you can see, any of the fips that have an NA will now give an NA for all the other rows with the same fips. I tried adding in na.rm=TRUE, but that doesn't do anything. I also was thinking about adding in a different function, i.e. ave(b$vix,b$fips,FUN=...) but I didn't know what to add. Maybe there is another way to do this altogether. I hope I was able to explain the problem clearly. Any and all help is appreciated!

like image 644
ejn Avatar asked Jul 31 '15 18:07

ejn


1 Answers

You can just use a custom function to pass the na.rm=TRUE flag to mean() with

b$vix <- b$vix - ave(b$vix,b$fips, FUN=function(x) mean(x, na.rm=T))

Tested with

b<-read.table(text="      fips      vix
08005 18
08005 19
08005 20
47155 10
47155 NA
47155 20", header=T)

b$vix <- b$vix - ave(b$vix,b$fips, FUN=function(x) mean(x, na.rm=T))
b
#    fips vix
# 1  8005  -1
# 2  8005   0
# 3  8005   1
# 4 47155  -5
# 5 47155  NA
# 6 47155   5
like image 93
MrFlick Avatar answered Nov 17 '22 23:11

MrFlick