I was using sum(is.na(my.df))
to check whether my data frame contained any NAs, which worked as I expected, but sum(is.nan(my.df))
did not work as I expected.
> my.df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN)) > my.df a b 1 1 5 2 2 NA 3 3 NaN > is.na(my.df) a b [1,] FALSE FALSE [2,] FALSE TRUE [3,] FALSE TRUE > is.nan(my.df) a b FALSE FALSE > sum(is.na(my.df)) [1] 2 > sum(is.nan(my.df)) [1] 0
Oh dear. Is there a reason for the inconsistency in behaviour? Is it for a lack of implementation, or is it intentional? What does the return value of is.nan(my.df)
signify? Is there a good reason not to use is.nan()
on a whole data frame?
In the documentation for is.na( )
and is.nan( )
, the argument types seem the same (although they don't specifically list data frames):
is.na()
: x R object to be tested: the default methods handle atomic vectors, lists and pairlists. is.nan()
: x R object to be tested: the default methods handle atomic vectors, lists and pairlists.
The NaN values are referred to as the Not A Number in R. It is also called undefined or unrepresentable but it belongs to numeric data type for the values that are not numeric, especially in case of floating-point arithmetic. To remove rows from data frame in R that contains NaN, we can use the function na.
In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.
From ?is.nan
:
All elements of logical,integer and raw vectors are considered not to be NaN, and elements of lists and pairlists are also unless the element is a length-one numeric or complex vector whose single element is NaN.
The columns of a data frame are technically "elements of a list", so is.nan(df)
returns a vector with length equal to the number of columns of the data frame, which is TRUE
only if the column consists of a single NaN
element:
> is.nan(data.frame(a=NaN,b=NA,c=1)) a b c TRUE FALSE FALSE
If you want behavior matching that of is.na
, use apply
:
sum(apply(my.df,2,is.nan))
The answer is 1 rather than 2 because is.nan(NA)
is FALSE
...
edit: alternatively, you can just turn the data frame into a matrix:
sum(is.nan(as.matrix(my.df)))
update: this behaviour changed shortly (two months) after the question was asked, in R version 2.14 (October 2011): from the NEWS file,
o The default methods for is.finite(), is.infinite() and is.nan() now signal an error if their argument is not an atomic vector.
The is.nan
function does not work with lists for some odd reason. Why it differs from is.na
is beyond me and appears to be a language design issue. However, there is a simple solution:
df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN)) df <- data.frame(sapply(df, function(x) ifelse(is.nan(x), NA, x))) df a b 1 1 5 2 2 NA 3 3 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With