Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning `Inf` values from an R dataframe

In R, I have an operation which creates some Inf values when I transform a dataframe.

I would like to turn these Inf values into NA values. The code I have is slow for large data, is there a faster way of doing this?

Say I have the following dataframe:

dat <- data.frame(a=c(1, Inf), b=c(Inf, 3), d=c("a","b")) 

The following works in a single case:

 dat[,1][is.infinite(dat[,1])] = NA 

So I generalized it with following loop

cf_DFinf2NA <- function(x) {     for (i in 1:ncol(x)){           x[,i][is.infinite(x[,i])] = NA     }     return(x) } 

But I don't think that this is really using the power of R.

like image 402
ricardo Avatar asked Aug 30 '12 00:08

ricardo


People also ask

How do I remove an INF from a column in R?

replace([np. inf, -np. inf], 0, inplace=True)” is used and this will replace all negative and positive infinite value with zero in “Marks” column of Pandas dataframe.

How do you handle infinity in R?

Some output values tend to have infinite value as the result e.g after diving by zero. To deal with such infinite values, we use is. infinite () A is. infinite () function finds infinite values in the given vector and returns TRUE value for them.

How do I change INF values in R?

By using the ifelse function, you can replace Inf with NA or with zero one way or another.

Why do I get inf in R?

This is because the Inf in R is directly derived from the international standard for floating point arithmetic 1 . Technically, Inf is a valid numeric that results from calculations like division of a number by zero.


1 Answers

Option 1

Use the fact that a data.frame is a list of columns, then use do.call to recreate a data.frame.

do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA))) 

Option 2 -- data.table

You could use data.table and set. This avoids some internal copying.

DT <- data.table(dat) invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))) 

Or using column numbers (possibly faster if there are a lot of columns):

for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA) 

Timings

# some `big(ish)` data dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6),                    c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),                     e = rep(c(Inf,2), 1e6)) # create data.table library(data.table) DT <- data.table(dat)  # replace (@mnel) system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA)))) ## user  system elapsed  #  0.52    0.01    0.53   # is.na (@dwin) system.time(is.na(dat) <- sapply(dat, is.infinite)) # user  system elapsed  # 32.96    0.07   33.12   # modified is.na system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite))) #  user  system elapsed  # 1.22    0.38    1.60    # data.table (@mnel) system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))) # user  system elapsed  # 0.29    0.02    0.31  

data.table is the quickest. Using sapply slows things down noticeably.

like image 180
mnel Avatar answered Oct 03 '22 18:10

mnel