Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R factor NA vs <NA>

Tags:

r

missing-data

na

I have the following data frame:

df1 <- data.frame(id = 1:20, fact1 = factor(rep(c('abc','def','NA',''),5)))
df1
   id fact1
1   1   abc
2   2   def
3   3    NA
4   4      
5   5   abc
6   6   def
7   7    NA
8   8      
9   9   abc
10 10   def
11 11    NA
12 12      
13 13   abc
14 14   def
15 15    NA
16 16      
17 17   abc
18 18   def
19 19    NA
20 20      

I'm trying to standardize all the missing values ('' and NA's) to become NA's. However when I use this:

df1[df1 == ''] <- NA

there seems to be 2 classes of NA's.

df1
   id fact1
1   1   abc
2   2   def
3   3    NA
4   4  <NA>
5   5   abc
6   6   def
7   7    NA
8   8  <NA>
9   9   abc
10 10   def
11 11    NA
12 12  <NA>
13 13   abc
14 14   def
15 15    NA
16 16  <NA>
17 17   abc
18 18   def
19 19    NA
20 20  <NA>

Is there a best-practices method for dealing with this situation?

like image 797
screechOwl Avatar asked Jun 14 '13 19:06

screechOwl


People also ask

Is Na the same as Na in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

Can na be a factor in R?

The codes of a factor may contain NA . For a numeric x , set exclude=NULL to make NA an extra level ( "NA" ), by default the last level. If "NA" is a level, the way to set a code to be missing is to use is.na on the left-hand-side of an assignment. Under those circumstances missing values are printed as <NA> .

What is the Na character in R?

In R, NA represents all types of missing data. We saw a small example of this in x1 and x2. x1 is a “numeric” object and x2 is a “character” object.

What does na R mean?

A missing value is one whose value is unknown. Missing values are represented in R by the NA symbol.


1 Answers

Expanding on joran's comment:

df1 <- data.frame(id = 1:5, fact1 = factor(c('abc','def', NA, 'NA','')))
> df1
  id fact1
1  1   abc
2  2   def
3  3  <NA>
4  4    NA
5  5      

df1[df1 == '' | df1 == 'NA'] <- NA
> df1
  id fact1
1  1   abc
2  2   def
3  3  <NA>
4  4  <NA>
5  5  <NA>
like image 87
Zach Avatar answered Sep 25 '22 02:09

Zach