I am looking for some guidance on when to use missing
, nothing
, undef
, and NaN
in Julia.
For example, all seem like reasonable choices for pre-allocating an array or returning from a try
/catch
.
TLDR:
If you're working in statistics, chances are that you want missing
to signal the absence of a particular data in a collection.
If you want to define an array of floating-point numbers, but initialize individual elements later, you might want to use undef
for performance reasons (to avoid spending time setting elements to a value, which will get overriden afterwards):
Vector{Float64}(undef, n)
In the same situation, but following an approach less oriented towards performance and more towards safety, you can also initialize all elements to NaN
in order to take advantage of the propagating behavior of NaN
to help identify bugs that could happen if you forget to set some value in the array:
fill(NaN, n)
You'll probably encounter nothing
in some part of Julia's API to signal cases where no meaningful value can be computed. But it is generally not used in arrays otherwise contaning numeric data (which seems to be your use case here)
Here is my take on the differences between these options:
missing
is used to represent missing values in a statistical sense, i.e. values that theoretically exist, but that you don't know. missing
is similar in spirit (and in behavior, in most cases) to NA
in R. A defining feature of missing
values is that you can use them in computations:
julia> x = 1 # x has a known value: 1
1
julia> y = missing # y has a value, but it is unknown
missing
julia> z = x * y # no error: z has a value, that just happens to be unknown
missing # (as a consequence of not knowing the value of y
One important characteristic of missing
is that it has its own specific type: Missing
. This means in particular that arrays containing missing
values among other numeric values are not homoegeneous in type:
julia> [1, missing, 3]
3-element Array{Union{Missing, Int64},1}: # not Array{Int64, 1}
1
missing
3
Note that, although the Julia compiler has become very good at handling such heterogeneous arrays for such small unions, there is an inherent performance issue with having elements of different types, as we can not know in advance what the type of an element will be.
nothing
also has its own type: Nothing
. In contrast to missing
, it tends to be used for things that have no value. Which is why, in contrast to missing
, computing with nothing
does not make sense, and errors out:
julia> 3*nothing
ERROR: MethodError: no method matching *(::Int64, ::Nothing)
nothing
is primarily used as the return value of functions that don't return anything, either because they only have side-effects, or because they could not compute any meaningful result:
julia> @show println("OK") # Only side effects
OK
println("OK") = nothing
julia> @show findfirst('a', "Hello") # No meaningful result
findfirst('a', "Hello") = nothing
An other notable use of nothing
is in function arguments or object fields for which a value is not always provided. This would typically be represented in the type system as a Union{MeaningfulType, Nothing}
. For example, with the following definition of a binary tree structure, a leaf (which, by definition, is a node that has no children) would be represented as a node of which the children are nothing
:
struct TreeNode
child1 :: Union{TreeNode, Nothing}
child2 :: Union{TreeNode, Nothing}
end
leaf = TreeNode(nothing, nothing)
Unlike the previous two, NaN
does not have its own specific type: NaN
is merely a specific value of the Float64
type (and NaN32
similarly exists for Float32
). As you probably know, these values normally appear as the result of undefined operations (such as 0/0), and have a very special meaning in floating-point arithmetic, which makes them propagate (in more or less the same way as missing
values). But apart from that arithmetic behavior, these are normal floating-point values. In particular, a vector of floating-point values may contain NaN
s without it affecting its type:
julia> [1., NaN, 2.]
3-element Array{Float64,1}: # Note how this differs from the example with missing above
1.0
NaN
2.0
undef
is very different from everything that has been mentioned so far. It is not really a value (at least not in the sense of a number having a value), but rather a "flag" that one can pass to array constructors to tell Julia not to initialize the values in the array (generally for performance considerations). In the following example, the array elements will not be set to any specific value but, since there is no such thing as a number without any value in Julia, elements will have arbitrary values (coming from whatever happens to be in memory where the vector gets allocated).
julia> Vector{Float64}(undef, 3)
3-element Array{Float64,1}:
6.94567437726575e-310
6.94569509953624e-310
6.94567437549977e-310
When elements are of more complex type (in technical words: non-isbits type) and a distinction can be made between initialized and uninitialized elements, Julia denotes the latter with #undef
julia> mutable struct Foo end
julia> Vector{Foo}(undef, 3)
3-element Array{Foo,1}:
#undef
#undef
#undef
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With