Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xarray.Dataset.where() method force-changes dtype of DataArrays to float

Problem description

I have a dataset with ints in them, and I'd like to select a subdataset by some criteria but I would like to preserve the integer datatype. It seems to me that Xarray force-changes the integer data to float datatype.

Example setup

Code

import numpy
import xarray

nums = numpy.random.randint(0, 100, 13)
names = numpy.random.choice(["babadook", "samara", "jason"], 13)
data_vars = {"num": xarray.DataArray(nums), "name": xarray.DataArray(names)}
dataset = xarray.Dataset(data_vars)
print(dataset)

Output

<xarray.Dataset>
Dimensions:  (dim_0: 13)
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    num      (dim_0) int64 93 99 49 35 92 14 41 57 28 59 74 1 15
    name     (dim_0) <U8 'babadook' 'samara' 'samara' 'samara' 'jason' ...
In [16]:

Example problem

Code

subdataset = dataset.where(dataset.num < 50, drop=True)
print(subdataset)

Output

<xarray.Dataset>
Dimensions:  (dim_0: 7)
Coordinates:
  * dim_0    (dim_0) int64 2 3 5 6 8 11 12
Data variables:
    num      (dim_0) float64 49.0 35.0 14.0 41.0 28.0 1.0 15.0
    name     (dim_0) <U32 'samara' 'samara' 'jason' 'babadook' 'jason' ...
like image 761
Ray Avatar asked Sep 12 '16 15:09

Ray


1 Answers

That's because with numpy (which xarray uses under-the-hood) ints don't have a way of representing NaNs. So with most where results, the type needs to be coerced to floats.

If drop=True and every value that is masked is dropped, that's not actually a constraint - you could have the new array retain its dtype, because there's no need for NaN values. That's not in xarray at the moment, but could be an additional feature.

like image 97
Maximilian Avatar answered Nov 14 '22 23:11

Maximilian