I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this: <pre class="prettyprint"><code>head(SMOKE) # N N <NA> Y Y N # Levels: Y N </code></pre> Why is R displaying <code>NA</code> as <code><NA></code>? And is there a difference?

When you are dealing with <code>factors</code>, when the <code>NA</code> is wrapped in angled brackets ( <code><NA></code> ), that indicates thtat it is in fact NA. When it is <code>NA</code> without brackets, then it is not NA, but rather a proper factor whose label is <code>"NA"</code> <pre class="prettyprint"><code># Note a 'real' NA and a string with the word "NA" x <- factor(c("hello", NA, "world", "NA")) x [1] hello <NA> world NA Levels: hello NA world <~~ The string appears as a level, the actual NA does not. as.numeric(x) [1] 1 NA 3 2 <~~ The string has a numeric value (here, 2, alphabetically) The NA's numeric value is just NA </code></pre> <hr> <h3>Edit to answer @Arun's question:</h3> <code>R</code> is simply trying to distinguish between a string whose value are the two letters <code>"NA"</code> and an actual missing value, <code>NA</code> Thus the difference you see when displaying <code>df</code> versus <code>df$y</code>. Example: <pre class="prettyprint"><code>df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE) </code></pre> Note the two different styles of NA: <pre class="prettyprint"><code>> df x y 1 1 a 2 2 <NA> 3 3 c 4 4 NA </code></pre> However, if we look at just 'df$y' <pre class="prettyprint"><code>[1] "a" NA "c" "NA" </code></pre> But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console): <pre class="prettyprint"><code>print(df$y, quote=FALSE) [1] a <NA> c NA </code></pre> And thus, we once again have the distinction of <code>NA</code> via the angled brackets.

What is the difference between <NA> and NA?

Tags:

r

missing-data

na

I have a factor named SMOKE with levels "Y" and "N". Missing values were replaced with NA (from the initial level "NULL"). However when I view the factor I get something like this:

head(SMOKE)
# N N <NA> Y Y N
# Levels: Y N

Why is R displaying NA as <NA>? And is there a difference?

235

asked Apr 27 '13 15:04

oort

2 Answers

When you are dealing with factors, when the NA is wrapped in angled brackets ( <NA> ), that indicates thtat it is in fact NA.

When it is NA without brackets, then it is not NA, but rather a proper factor whose label is "NA"

# Note a 'real' NA and a string with the word "NA"
x <- factor(c("hello", NA, "world", "NA"))

x
[1] hello <NA>  world NA   
Levels: hello NA world      <~~ The string appears as a level, the actual NA does not. 

as.numeric(x)              
[1]  1 NA  3  2            <~~ The string has a numeric value (here, 2, alphabetically)
                               The NA's numeric value is just NA

Edit to answer @Arun's question:

R is simply trying to distinguish between a string whose value are the two letters "NA" and an actual missing value, NA Thus the difference you see when displaying df versus df$y. Example:

df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)

Note the two different styles of NA:

> df
  x    y
1 1    a
2 2 <NA>
3 3    c
4 4   NA

However, if we look at just 'df$y'

[1] "a"  NA   "c"  "NA"

But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):

print(df$y, quote=FALSE)
[1] a    <NA> c    NA

And thus, we once again have the distinction of NA via the angled brackets.

answered Sep 24 '22 15:09

Ricardo Saporta

It is just the way that R displays NA in a factor:

> as.factor(NA)
[1] <NA>
Levels: 
> 
> f <- factor(c(1:3, NA))
> levels(f)
[1] "1" "2" "3"
> f
[1] 1    2    3    <NA>
Levels: 1 2 3
> is.na(f)
[1] FALSE FALSE FALSE  TRUE

One presumes this is a means by which one would differentiate between NA and "NA" in the way a factor is printed as it prints without the quotes, even for character labels/levels:

> f2 <- factor(c("NA",NA))
> f2
[1] NA   <NA>
Levels: NA
> is.na(f2)
[1] FALSE  TRUE

answered Sep 24 '22 15:09

Gavin Simpson

Related questions
                            
                                Producing subscripts in R markdown
                            
                                Unable to load rJava on R
                            
                                How to output text in the R console without creating new lines?
                            
                                Get the mean across multiple Pandas DataFrames
                            
                                Write a data frame to csv file without column header in R [duplicate]
                            
                                Return row number(s) for a particular value in a column in a dataframe
                            
                                R - test if first occurrence of string1 is followed by string2
                            
                                How do I save warnings and errors as output from a function?
                            
                                Extract R-square value with R in linear models [duplicate]
                            
                                Practical limits of R data frame
                            
                                remove all line breaks (enter symbols) from the string using R
                            
                                Finding percentage in a sub-group using group_by and summarise
                            
                                How to order a data frame by one descending and one ascending column?
                            
                                Why do I get "warning longer object length is not a multiple of shorter object length"?
                            
                                How to Reverse a string in R
                            
                                How to control ordering of stacked bar chart using identity on ggplot2
                            
                                Calculate AUC in R?
                            
                                How to do a data.table merge operation
                            
                                Specify widths and heights of plots with grid.arrange
                            
                                SparkR vs sparklyr [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With