Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset a factor by NA levels

I have a factor in R, with an NA level.

set.seed(1)
x <- sample(c(1, 2, NA), 25, replace=TRUE)
x <- factor(x, exclude = NULL)
> x
 [1] 1    2    2    <NA> 1    <NA> <NA> 2    2    1    1   
[12] 1    <NA> 2    <NA> 2    <NA> <NA> 2    <NA> <NA> 1   
[23] 2    1    1   
Levels: 1 2 <NA>

How do I subset that factor by the <NA> level? Both methods I tried did not work.

> x[is.na(x)]
factor(0)
Levels: 1 2 <NA>
> x[x=='<NA>']
factor(0)
Levels: 1 2 <NA>
like image 911
Zach Avatar asked Feb 03 '23 09:02

Zach


2 Answers

Surprising to me that your attempts to do this didn't work, but this seems to:

x[is.na(levels(x)[x])]

I got there by looking at str(x) and seeing that it is the levels that are NA, not the underlying codes:

str(x)
 Factor w/ 3 levels "1","2",NA: 1 2 2 3 1 3 3 2 2 1 ...
like image 145
Ben Bolker Avatar answered Feb 05 '23 14:02

Ben Bolker


As a follow up to Ben:

str(x) shows you the problem. Factors are stored as integers internally with a "lookup" of sorts. So:

> all(is.na(x))
[1] FALSE

but

> any(is.na(levels(x)))
[1] TRUE

and as ben showed, to print the actual values of the vector:

> levels(x)[x]
 [1] "1" "2" "2" NA  "1" NA  NA  "2" "2" "1" "1" "1" NA  "2" NA  "2" NA  NA  "2" NA  NA       "1" "2" "1" "1"

versus

> x
 [1] 1    2    2    <NA> 1    <NA> <NA> 2    2    1    1    1    <NA> 2    <NA> 2    <NA> <NA> 2    <NA> <NA> 1    2    1    1
Levels: 1 2 <NA>
like image 27
Justin Avatar answered Feb 05 '23 13:02

Justin