Currently, my code heavily uses structured masked arrays with multidimensional dtypes, with dozens of fields and item sizes of many kilobytes. It appears that xarray
could be a great alternative, but when I try to pass it a masked array, it changes its dtype to float:
In [137]: x = arange(30, dtype="i1").reshape(3, 10)
In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) int64 0 1 2
Data variables:
count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...
This is undesirable for me, because (1) the memory consumption of my dataset will explode (it is already large), and (2) many of my integer-dtypes are bit fields which must not be represented as floats. Although an int32
bitfield can be losslessly represented as a float64
, it's ugly and error-prone to go back and forth.
Is it possible to use xarray.Dataset
with masked arrays while preserving integer dtypes?
Edit: It appears the problem occurs in _maybe_promote
. See also github issue.
Unfortunately, xarray does not support masked arrays or any form of integer dtypes with missing values. The reasons for this choice are the same as those why pandas does not (currently) support integer NAs, as described the pandas docs under Cavaets and Gotchas. We would need an integer dtype that supports missing values for NumPy arrays, which unfortunately does not exist.
I agree that this is not a very satisfactory solution for images with missing values, though in many cases I have found it suffices to either work with non-masked integer data, converting to float (and masking missing values) only when necessary for arithmetic (e.g., making use of .fillna()
).
With regards to memory usage, I recommend trying xarray with dask, which allows for performing most array operations in a streaming or distributed fashion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With