I have a very large dataframe(df) with approximately 35-45 columns(variables) and rows greater than 300. Some of the rows contains NA,NaN,Inf,-Inf values in single or multiple variables and I have used
na.omit(df)
to remove rows with NA and NaN but I cant remove rows with Inf and -Inf values using na.omit function.
While searching I came across this thread Remove rows with Inf and NaN in R and used the modified code df[is.finite(df)]
but its not removing the rows with Inf and -Inf and also gives this error
Error in is.finite(df) : default method not implemented for type 'list'
EDITED
Remove the entire row even the corresponding one or multiple columns have inf and -inf
To remove the rows in R, use the subsetting in R. There is no built-in function of removing a row from the data frame, but you can access a data frame without some rows specified by the negative index. This process is also called subsetting. This way, you can remove unwanted rows from the data frame.
So we can replace Inf with NA by using lapply() function, we have to create our own function to replace Inf with NA and then pass that function to lapply() through do. call method. Where value is the input value and replace() is used to replace the value to NA if it is infinite.
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
To remove the rows with +/-Inf
I'd suggest the following:
df <- df[!is.infinite(rowSums(df)),]
or, equivalently,
df <- df[is.finite(rowSums(df)),]
The second option (the one with is.finite()
and without the negation) removes also rows containing NA
values in case that this has not already been done.
Depending on the data, there are a couple options using scoped variants of dplyr::filter()
and is.finite()
or is.infinite()
that might be useful:
library(dplyr)
# sample data
df <- data_frame(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# across all columns:
df %>%
filter_all(all_vars(!is.infinite(.)))
# note that is.finite() does not work with NA or strings:
df %>%
filter_all(all_vars(is.finite(.)))
# checking only numeric columns:
df %>%
filter_if(~is.numeric(.), all_vars(!is.infinite(.)))
# checking only select columns, in this case a through c:
df %>%
filter_at(vars(a:c), all_vars(!is.infinite(.)))
The is.finite
works on vector
and not on data.frame
object. So, we can loop through the data.frame
using lapply
and get only the 'finite' values.
lapply(df, function(x) x[is.finite(x)])
If the number of Inf
, -Inf
values are different for each column, the above code will have a list
with elements having unequal length
. So, it may be better to leave it as a list
. If we want a data.frame
, it should have equal lengths.
If we want to remove rows contain any NA or Inf/-Inf values
df[Reduce(`&`, lapply(df, function(x) !is.na(x) & is.finite(x))),]
Or a compact option by @nicola
df[Reduce(`&`, lapply(df, is.finite)),]
If we are ready to use a package, a compact option would be NaRV.omit
library(IDPmisc)
NaRV.omit(df)
set.seed(24)
df <- as.data.frame(matrix(sample(c(1:5, NA, -Inf, Inf),
20*5, replace=TRUE), ncol=5))
To keep the rows without Inf
we can do:
df[apply(df, 1, function(x) all(is.finite(x))), ]
Also NA
s are handled by this because of:
a rowindex with value NA
will remove this row in the result.
Also rows with NaN
are not in the result.
set.seed(24)
df <- as.data.frame(matrix(sample(c(0:9, NA, -Inf, Inf, NaN), 20*5, replace=TRUE), ncol=5))
df2 <- df[apply(df, 1, function(x) all(is.finite(x))), ]
Here are the results of the different is.~
-functions:
x <- c(42, NA, NaN, Inf)
is.finite(x)
# [1] TRUE FALSE FALSE FALSE
is.na(x)
# [1] FALSE TRUE TRUE FALSE
is.nan(x)
# [1] FALSE FALSE TRUE FALSE
I had this problem and none of the above solutions worked for me. I used the following to remove rows with +/-Inf in columns 15 and 16 of my dataframe.
d<-subset(c, c[,15:16]!="-Inf")
e<-subset(d, d[,15:16]!="Inf")
It took me awhile to work this out for dplyr 1.0.0 so i thought i would put up the new version of @sbha solutions using c_across
since filter_all
, filter_if
are getting deprecated.
library(dplyr)
df <- tibble(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!all(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 4 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(a:c))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
To be honest I think @sbha answer is simpler!
df[!is.infinite(df$x),]
wherein x is the column of df that contains the infinite values. The first answer posted was contingent on rowsums but for my own problem, the df had columns which could not be added.
I consider myself new to coding and I couldn't get the recommendations above to work with my code.
I found a less complicated way to reduce a dataframe with 2 lines, first by replacing Inf with Na, then by selecting rows with complete data:
Df[sapply(Df, is.infinite)] <- NA
Df<-Df[complete.cases(Df), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With