I have a lot of csv files of temperature data which I am importing into R to process. These files look like:
ID Date.Time temp1 temp2
1 08/13/17 14:48:18 15.581 -0.423
2 08/13/17 16:48:18 17.510 -0.423
3 08/13/17 18:48:18 15.390 -0.423
Sometimes the temperature readings in columns 3 and 4 are clearly wrong and have to be replaced with NA values. I know that anything over 50 or under -50 is an error. I'd like to just remove these right away. Using
df[,c(3,4)]<- replace(df[,c(3,4)], df[,c(3,4)] >50, NA)
df[,c(3,4)] <- replace(df[,c(3,4)], df[,c(3,4)] < -50, NA)
works but I don't really want to have to repeat this for every file because it seems messy.
I would like to make a function to replace all this like:
df<-remove.errors(df[,c(3,4)])
I've tried:
remove.errors<-function (df) {
df[,]<- replace(df[,], df[,] > 50, NA)
df[,]<- replace(df[,], df[,] < -50, NA)
}
df<-remove.errors(df[,c(3,4)])
This works but unfortunately only keeps the 3rd and 4th columns and the first two disappear. I've played around with this code for far too long and tried some other things which didn't work at all.
I know I'm probably missing something basic. Anyone have any tips on making a function which will replace values in columns 3 and 4 with NAs without changing the first two columns?
1) Try this. It uses only base R.
clean <- function(x, max = 50, min = -max) replace(x, x > max | x < min, NA)
df[3:4] <- clean(df[3:4])
1a) Alternately we could do this (which does not overwrite df):
transform(df, temp1 = clean(temp1), temp2 = clean(temp2))
2) Adding in magrittr we could do this:
library(magrittr)
df[3:4] %<>% { clean(.) }
3) In dplyr we could do this:
library(dplyr)
df %>% mutate_at(3:4, clean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With