In numpy there are two ways to mark missing values: I can either use a NaN
or a masked array
. I understand that using NaNs is (potentially) faster while masked array offers more functionality (which?).
I guess my question is if/ when should I use one over the other?
What is the use case of np.NaN
in a regular array vs. a masked array
?
I am sure the answer must be out there but I could not find it...
Keep in mind that strange np.nan behaviours, mentioned by jrmyp, include unexpected results for example when using functions of the statsmodels (e.g. ttest) or numpy module (e.g. average). From experience, most those functions have workarounds for NaNs, but it has the potential of driving you mad for a while. This seems like a reason to mask arrays whenever possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With