In R, I have an operation which creates some Inf
values when I transform a dataframe.
I would like to turn these Inf
values into NA
values. The code I have is slow for large data, is there a faster way of doing this?
Say I have the following dataframe:
dat <- data.frame(a=c(1, Inf), b=c(Inf, 3), d=c("a","b"))
The following works in a single case:
dat[,1][is.infinite(dat[,1])] = NA
So I generalized it with following loop
cf_DFinf2NA <- function(x) { for (i in 1:ncol(x)){ x[,i][is.infinite(x[,i])] = NA } return(x) }
But I don't think that this is really using the power of R.
replace([np. inf, -np. inf], 0, inplace=True)” is used and this will replace all negative and positive infinite value with zero in “Marks” column of Pandas dataframe.
Some output values tend to have infinite value as the result e.g after diving by zero. To deal with such infinite values, we use is. infinite () A is. infinite () function finds infinite values in the given vector and returns TRUE value for them.
By using the ifelse function, you can replace Inf with NA or with zero one way or another.
This is because the Inf in R is directly derived from the international standard for floating point arithmetic 1 . Technically, Inf is a valid numeric that results from calculations like division of a number by zero.
Use the fact that a data.frame
is a list of columns, then use do.call
to recreate a data.frame
.
do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))
data.table
You could use data.table
and set
. This avoids some internal copying.
DT <- data.table(dat) invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))
Or using column numbers (possibly faster if there are a lot of columns):
for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)
# some `big(ish)` data dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6), c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6), e = rep(c(Inf,2), 1e6)) # create data.table library(data.table) DT <- data.table(dat) # replace (@mnel) system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA)))) ## user system elapsed # 0.52 0.01 0.53 # is.na (@dwin) system.time(is.na(dat) <- sapply(dat, is.infinite)) # user system elapsed # 32.96 0.07 33.12 # modified is.na system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite))) # user system elapsed # 1.22 0.38 1.60 # data.table (@mnel) system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))) # user system elapsed # 0.29 0.02 0.31
data.table
is the quickest. Using sapply
slows things down noticeably.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With