How do I make a function in R to check for data errors?

Question

I have a lot of csv files of temperature data which I am importing into R to process. These files look like:

ID   Date.Time          temp1    temp2
1    08/13/17 14:48:18  15.581  -0.423
2    08/13/17 16:48:18  17.510  -0.423
3    08/13/17 18:48:18  15.390  -0.423

Sometimes the temperature readings in columns 3 and 4 are clearly wrong and have to be replaced with NA values. I know that anything over 50 or under -50 is an error. I'd like to just remove these right away. Using

df[,c(3,4)]<- replace(df[,c(3,4)], df[,c(3,4)] >50, NA)
df[,c(3,4)] <- replace(df[,c(3,4)], df[,c(3,4)] < -50, NA)

works but I don't really want to have to repeat this for every file because it seems messy.

I would like to make a function to replace all this like:

df<-remove.errors(df[,c(3,4)])

I've tried:

remove.errors<-function (df) {
  df[,]<- replace(df[,], df[,] > 50, NA)
  df[,]<- replace(df[,], df[,] < -50, NA)
  }

df<-remove.errors(df[,c(3,4)])

This works but unfortunately only keeps the 3rd and 4th columns and the first two disappear. I've played around with this code for far too long and tried some other things which didn't work at all.

I know I'm probably missing something basic. Anyone have any tips on making a function which will replace values in columns 3 and 4 with NAs without changing the first two columns?

G. Grothendieck · Accepted Answer

1) Try this. It uses only base R.

clean <- function(x, max = 50, min = -max) replace(x, x > max | x < min, NA)
df[3:4] <- clean(df[3:4])

1a) Alternately we could do this (which does not overwrite df):

transform(df, temp1 = clean(temp1), temp2 = clean(temp2))

2) Adding in magrittr we could do this:

library(magrittr)
df[3:4] %<>% { clean(.) }

3) In dplyr we could do this:

library(dplyr)

df %>% mutate_at(3:4, clean)

How do I make a function in R to check for data errors?

Tags:

function

r

data-cleaning

user97878

1 Answers

G. Grothendieck

Recent Activity

Donate For Us

How do I make a function in R to check for data errors?

Tags:

function

r

data-cleaning

user97878

1 Answers

G. Grothendieck

Related questions

Recent Activity

Donate For Us