Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data frames and is.nan()

Tags:

I was using sum(is.na(my.df)) to check whether my data frame contained any NAs, which worked as I expected, but sum(is.nan(my.df)) did not work as I expected.

> my.df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN)) > my.df   a   b 1 1   5 2 2  NA 3 3 NaN > is.na(my.df)          a     b [1,] FALSE FALSE [2,] FALSE  TRUE [3,] FALSE  TRUE > is.nan(my.df)     a     b  FALSE FALSE  > sum(is.na(my.df)) [1] 2 > sum(is.nan(my.df)) [1] 0 

Oh dear. Is there a reason for the inconsistency in behaviour? Is it for a lack of implementation, or is it intentional? What does the return value of is.nan(my.df) signify? Is there a good reason not to use is.nan() on a whole data frame?

In the documentation for is.na( ) and is.nan( ), the argument types seem the same (although they don't specifically list data frames):

is.na(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists. is.nan(): x R object to be tested: the default methods handle atomic vectors, lists and pairlists.

like image 746
Zach Avatar asked Aug 11 '11 18:08

Zach


People also ask

Is NaN data frame in R?

The NaN values are referred to as the Not A Number in R. It is also called undefined or unrepresentable but it belongs to numeric data type for the values that are not numeric, especially in case of floating-point arithmetic. To remove rows from data frame in R that contains NaN, we can use the function na.

How do you check if a DataFrame is null?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.


2 Answers

From ?is.nan:

All elements of logical,integer and raw vectors are considered not to be NaN, and elements of lists and pairlists are also unless the element is a length-one numeric or complex vector whose single element is NaN. 

The columns of a data frame are technically "elements of a list", so is.nan(df) returns a vector with length equal to the number of columns of the data frame, which is TRUE only if the column consists of a single NaN element:

> is.nan(data.frame(a=NaN,b=NA,c=1))     a     b     c   TRUE FALSE FALSE  

If you want behavior matching that of is.na, use apply:

sum(apply(my.df,2,is.nan)) 

The answer is 1 rather than 2 because is.nan(NA) is FALSE ...

edit: alternatively, you can just turn the data frame into a matrix:

 sum(is.nan(as.matrix(my.df))) 

update: this behaviour changed shortly (two months) after the question was asked, in R version 2.14 (October 2011): from the NEWS file,

o The default methods for is.finite(), is.infinite() and is.nan() now signal an error if their argument is not an atomic vector.

like image 55
Ben Bolker Avatar answered Oct 05 '22 22:10

Ben Bolker


The is.nan function does not work with lists for some odd reason. Why it differs from is.na is beyond me and appears to be a language design issue. However, there is a simple solution:

df <- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))  df <- data.frame(sapply(df, function(x) ifelse(is.nan(x), NA, x))) df   a  b 1 1  5 2 2 NA 3 3 NA 
like image 35
Adam Erickson Avatar answered Oct 06 '22 00:10

Adam Erickson