Using DataArray objects in xarray what is the best way to find all cells that have values != 0.
For example in pandas I would do
df.loc[df.col1 > 0]
My specific example I'm trying to look at 3 dimensional brain imaging data.
first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']
Looking at the documentation for xarray.DataArray.where it seems I want something like this:
first_image_xarray.where(first_image_xarray.y + first_image_xarray.x > 0,drop = True)[:,0,0]
But I still get arrays with zeros.
<xarray.DataArray (x: 140)>
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -0., 0., -0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Dimensions without coordinates: x
Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?
Xarray is a python package for working with labeled multi-dimensional (a.k.a. N-dimensional, ND) arrays, it includes functions for advanced analytics and visualization. Xarray is heavily inspired by pandas and it uses pandas internally.
xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection. The most basic way to access elements of a DataArray object is to use Python's [] syntax, such as array[i, j] , where i and j are both integers.
When the Pangeo machine learning working group met today, we found that several of us have struggled with this. I made some simplified benchmarks, which show that xarray is about 1000 times slower than numpy when repeatedly grabbing a small amount of data from an array. This is a problem with both isel or [] indexing.
Xarray doesn't have an append method because its data structures are built on top of NumPy's non-resizable arrays, so we cannot append new elements without copying the entire array. Hence, we don't implement an append method. Instead, you should use xarray. concat .
(Answer to main question)
You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter >0
values using a "value-based" mask.
# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)
or
# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)
On the other hand, the reason why your attempt did not work as you hoped is because with first_image_xarray.x
, it is referring to the index of elements in the array (in x
direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be nan
instead of 0
because it only does not suffice the mask condition in slice [:,0,0]
. Yes, you were creating an "index-based" mask.
The following small experiment (hopefully) articulates this critical difference.
Suppose we have DataArray
which consists of only 0
and 1
(dimension is aligned with the original post (OP) of the question (140,140,96)
). Firstly let's mask it based on index as OP did:
import numpy as np
import xarray as xr
np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))
# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]
Out:
<xarray.DataArray (x: 140)>
array([ nan, 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0.,
1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1.,
0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 1.])
Dimensions without coordinates: x
With the mask above, only the element where index of both x
and y
are 0
turns in to nan
and the rest has not been changed or dropped at all.
In contrast, the proposed solution masks the DataArray
based on the values of DataArray
elements.
# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)
Out:
<xarray.DataArray (x: 65)>
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dimensions without coordinates: x
This successfully dropped all the values which do not suffice a mask condition based on the values of DataArray
elements.
(Answer to side question)
As for the origin of -0
and 0
in DataArray
, rounded values from negative or positive side towards 0
would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case.
import numpy as np
import xarray as xr
xr_array = xr.DataArray([-0.1, 0.1])
# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray
xr.DataArray.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
np.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
As a side note, the other possibility for getting -0
in NumPy array can be numpy.set_printoptions(precision=0)
, which hides below decimal point like below (but I know this is not the case this time since you are using DataArray
):
import numpy as np
# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)
np.array([-0.1, 0.1])
Out:
array([-0., 0.])
Anyway, My best guess is that the conversion to -0
should be manual and intentional rather than automatic in data preparation & pre-processing phase.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With