Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behaviour of function table with "NaN" values

Tags:

r

na

Recently, I've faced a behaviour in table function that was not what I was expected:

For example, let take the following vector:

ex_vec <- c("Non", "Non", "Nan", "Oui", "NaN", NA)

If I check for NA values in my vector, "NaN" is not considered one (as expected):

is.na(ex_vec)
# [1] FALSE FALSE FALSE FALSE FALSE  TRUE

But if I tried to get the different values frequencies:

table(ex_vec)
#ex_vec
#Nan Non Oui 
#  1   2   1

"NaN" does not appear in the table.

However, if I "ask" table to show the NA values, I get this:

table(ex_vec, useNA="ifany")
#ex_vec
# Nan  NaN  Non  Oui <NA> 
#   1    1    2    1    1

So, the character strings "NaN" is treated as a NA value inside table call, while being treated in the ouput as a not NA value.

I know (it would be better and) I could solve my problem by converting my vector to a factor but nonetheless, I'd really like to know what's going on here. Does anyone have an idea?

like image 380
Cath Avatar asked Dec 03 '15 15:12

Cath


People also ask

What causes a NaN error?

NaN, an acronym for Not a Number is an exception that usually occurs in the cases when an expression results in a number that is undefined or can't be represented. It is used for floating-point operations. For example: The square root of negative numbers.

What is a NaN error in R?

In R, NaN stands for Not a Number. Typically NaN values occur when you attempt to perform some calculation that results in an invalid result.

What does NaN in Rstudio mean?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number).


2 Answers

When factor matches up levels for a vector it converts its exclude list to the same type as the input vector:

exclude <- as.vector(exclude, typeof(x))

so if your exclude list has NaN and your vector is character, this happens:

as.vector(exclude, typeof(letters))
[1] NA    "NaN"

Oh dear. Now the real "NaN" strings will be excluded.

To fix, use exclude=NA in table (and factor if you are making factors that might hit this).

I do love this in the docs for factor:

 There are some anomalies associated with factors that have ‘NA’ as
 a level.  It is suggested to use them sparingly, e.g., only for
 tabulation purposes.

Reassuring...

like image 62
Spacedman Avatar answered Oct 26 '22 05:10

Spacedman


First idea coming to my mind was to have a look at table definition which start by:

> table
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", 
    "ifany", "always"), dnn = list.names(...), deparse.level = 1) 
{

Sounds logical, by default table exclude NA and NaN.

Digging within table code we see that if xis not a factor it coerce it to a factor (nothing new here, it's said in the doc).

    else {
        a <- factor(a, exclude = exclude)

I didn't find anything else which could have impacted the input to coerce "NaN" into NA values.

So looking into factor to get the why we find the root cause:

> factor
function (x = character(), levels, labels = levels, exclude = NA, 
    ordered = is.ordered(x), nmax = NA) 
{
 [...] # Snipped for brievety
    exclude <- as.vector(exclude, typeof(x))
    x <- as.character(x)
    levels <- levels[is.na(match(levels, exclude))] # defined in the snipped part above, is the sorted unique values of input vector, coerced to char.
    f <- match(x, levels)
 [...]
    f
}

Here we got it, the exclude parameter, even being NA values is coerced into a character vector.

So what happens is:

> ex_vec <- c("Non", "Non", "Nan", "Oui", "NaN", NA)
> excludes<-c(NA,NaN)
> as.vector(excludes,"character")
[1] NA    "NaN"
> match(ex_vec,as.vector(excludes,"character"))
[1] NA NA NA NA  2  1

We do match character "NaN" as the exclude vector as been coerced to character before comparison.

like image 28
Tensibai Avatar answered Oct 26 '22 05:10

Tensibai