Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper way to test for NA in Julia DataFrames

What is the proper way to test if a value in a DataFrame is NA in the Julia DataFrames package?

I have this far found out that typeof(var) == NAtype works, but is there a more elegant way of doing it?

like image 381
Skeppet Avatar asked Jan 26 '15 15:01

Skeppet


1 Answers

Using typeof(var) == NAtype for this is awkward, in particular because it is not vectorized.

The canonical way of testing for NA values is to use the (vectorized) function called isna.

Example

Let's generate a toy DataFrame with some NA values in the B column:

julia> using DataFrames

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | 2  |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | 8  |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | 16 |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

julia> df[[1,4,8],symbol("B")] = NA
NA

julia> df
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | NA |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | NA |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | NA |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

Now let's pretend we don't know the contents of our DataFrame and ask, for example, the following question:

Does column B contain an NA values?

The typeof approach won't work, here:

julia> typeof(df[:,symbol("B")]) == NAtype
false

The isna function is more adequate:

julia> any(isna(df[:,symbol("B")]))
  true
like image 152
jub0bs Avatar answered Oct 09 '22 19:10

jub0bs