I have a humongous data frame. It looks something like this:
> b
fips vix
1400 08005 18.58862
1401 47155 10.93712
1402 51191 10.93712
1403 47059 10.93712
1404 08005 10.93712
1405 08059 10.93712
1406 47063 10.93712
1407 37021 10.93712
1408 08031 10.93712
1409 45083 10.93712
1410 37089 10.93712
1411 37113 10.93712
1412 13207 10.93712
1413 08041 10.93712
1414 47093 21.50425
1415 08031 21.50425
1416 37009 21.50425
1417 36103 21.50425
1418 08035 21.50425
1419 08031 53.58363
1420 08035 53.58363
1421 08013 53.58363
1422 55105 21.17450
1423 08001 21.17450
1424 08031 21.17450
1425 47179 21.17450
1426 08059 21.17450
1427 37009 17.35675
1428 08041 17.35675
1429 08031 17.35675
1430 08005 17.35675
1431 08001 NA
1432 08031 NA
1433 47059 NA
1434 47145 NA
1435 13207 NA
1436 37021 NA
1437 37113 NA
1438 37089 NA
I took out some of the columns for simplicity sake and have only shown a fraction of the rows. I am trying to change the Vix column. What I am trying to do is this:
b$vix <- b$vix - ave(b$vix,b$fips)
What that SHOULD do is subtract the group means from each value of Vix. For example, for observation 1400, I want to take the average of all the observations that have fips==08005 and then do 18.58862 minus that average. However, the problem is that there are NA values. I want the average function to IGNORE the NA values. Instead, what happens is that any group of fips code that has one NA turns up as NA:
> b$vix <- b$vix - ave(b$vix,b$fips)
> b
fips vix
1400 08005 2.961125
1401 47155 0.000000
1402 51191 0.000000
1403 47059 NA
1404 08005 -4.690375
1405 08059 -5.118688
1406 47063 0.000000
1407 37021 NA
1408 08031 NA
1409 45083 0.000000
1410 37089 NA
1411 37113 NA
1412 13207 NA
1413 08041 -3.209812
1414 47093 0.000000
1415 08031 NA
1416 37009 2.073750
1417 36103 0.000000
1418 08035 -16.039688
1419 08031 NA
1420 08035 16.039688
1421 08013 0.000000
1422 55105 0.000000
1423 08001 NA
1424 08031 NA
1425 47179 0.000000
1426 08059 5.118688
1427 37009 -2.073750
1428 08041 3.209812
1429 08031 NA
1430 08005 1.729250
1431 08001 NA
1432 08031 NA
1433 47059 NA
1434 47145 NA
1435 13207 NA
1436 37021 NA
1437 37113 NA
1438 37089 NA
As you can see, any of the fips that have an NA will now give an NA for all the other rows with the same fips. I tried adding in na.rm=TRUE, but that doesn't do anything. I also was thinking about adding in a different function, i.e. ave(b$vix,b$fips,FUN=...) but I didn't know what to add. Maybe there is another way to do this altogether. I hope I was able to explain the problem clearly. Any and all help is appreciated!
You can just use a custom function to pass the na.rm=TRUE
flag to mean()
with
b$vix <- b$vix - ave(b$vix,b$fips, FUN=function(x) mean(x, na.rm=T))
Tested with
b<-read.table(text=" fips vix
08005 18
08005 19
08005 20
47155 10
47155 NA
47155 20", header=T)
b$vix <- b$vix - ave(b$vix,b$fips, FUN=function(x) mean(x, na.rm=T))
b
# fips vix
# 1 8005 -1
# 2 8005 0
# 3 8005 1
# 4 47155 -5
# 5 47155 NA
# 6 47155 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With