Using DataArray objects in xarray what is the best way to find all cells that have values != 0. For example in pandas I would do <pre class="prettyprint"><code>df.loc[df.col1 > 0] </code></pre> My specific example I'm trying to look at 3 dimensional brain imaging data. <pre class="prettyprint"><code>first_image_xarray.shape (140, 140, 96) dims = ['x','y','z'] </code></pre> Looking at the documentation for xarray.DataArray.where it seems I want something like this: <pre class="prettyprint"><code>first_image_xarray.where(first_image_xarray.y + first_image_xarray.x > 0,drop = True)[:,0,0] </code></pre> But I still get arrays with zeros. <pre class="prettyprint"><code><xarray.DataArray (x: 140)> array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -0., 0., -0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Dimensions without coordinates: x </code></pre> Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?

(Answer to main question) You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter <code>>0</code> values using a "value-based" mask. <pre class="prettyprint"><code># if you want to DROP values which do not suffice a mask condition first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True) </code></pre> or <pre class="prettyprint"><code># if you want to KEEP values which do not suffice a mask condition as nan first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan) </code></pre> On the other hand, the reason why your attempt did not work as you hoped is because with <code>first_image_xarray.x</code>, it is referring to the index of elements in the array (in <code>x</code> direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be <code>nan</code> instead of <code>0</code> because it only does not suffice the mask condition in slice <code>[:,0,0]</code>. Yes, you were creating an "index-based" mask. The following small experiment (hopefully) articulates this critical difference. Suppose we have <code>DataArray</code> which consists of only <code>0</code> and <code>1</code> (dimension is aligned with the original post (OP) of the question <code>(140,140,96)</code>). Firstly let's mask it based on index as OP did: <pre class="prettyprint"><code>import numpy as np import xarray as xr np.random.seed(0) # create a DataArray which randomly contains 0 or 1 values a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z')) # with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan a.where(a.x + a.y > 0, drop=True)[:,0,0] Out: <xarray.DataArray (x: 140)> array([ nan, 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 1.]) Dimensions without coordinates: x </code></pre> With the mask above, only the element where index of both <code>x</code> and <code>y</code> are <code>0</code> turns in to <code>nan</code> and the rest has not been changed or dropped at all. In contrast, the proposed solution masks the <code>DataArray</code> based on the values of <code>DataArray</code> elements. <pre class="prettyprint"><code># with this "value-based" mask, all the values which do not suffice the mask condition are dropped a[:,0,0].where(a[:,0,0] > 0, drop=True) Out: <xarray.DataArray (x: 65)> array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) Dimensions without coordinates: x </code></pre> This successfully dropped all the values which do not suffice a mask condition based on the values of <code>DataArray</code> elements. <hr> (Answer to side question) As for the origin of <code>-0</code> and <code>0</code> in <code>DataArray</code>, rounded values from negative or positive side towards <code>0</code> would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case. <pre class="prettyprint"><code>import numpy as np import xarray as xr xr_array = xr.DataArray([-0.1, 0.1]) # you can use either xr.DataArray.round() or np.round() for rounding values of DataArray xr.DataArray.round(xr_array) Out: <xarray.DataArray (dim_0: 2)> array([-0., 0.]) Dimensions without coordinates: dim_0 np.round(xr_array) Out: <xarray.DataArray (dim_0: 2)> array([-0., 0.]) Dimensions without coordinates: dim_0 </code></pre> As a side note, the other possibility for getting <code>-0</code> in NumPy array can be <code>numpy.set_printoptions(precision=0)</code>, which hides below decimal point like below (but I know this is not the case this time since you are using <code>DataArray</code>): <pre class="prettyprint"><code>import numpy as np # default value is precision=8 in ver1.15 np.set_printoptions(precision=0) np.array([-0.1, 0.1]) Out: array([-0., 0.]) </code></pre> Anyway, My best guess is that the conversion to <code>-0</code> should be manual and intentional rather than automatic in data preparation & pre-processing phase. Hope this helps.

Sparse DataArray Xarray search

Tags:

python

pandas

python-xarray

Using DataArray objects in xarray what is the best way to find all cells that have values != 0.

For example in pandas I would do

df.loc[df.col1 > 0]

My specific example I'm trying to look at 3 dimensional brain imaging data.

first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']

Looking at the documentation for xarray.DataArray.where it seems I want something like this:

first_image_xarray.where(first_image_xarray.y + first_image_xarray.x  > 0,drop = True)[:,0,0]

But I still get arrays with zeros.

<xarray.DataArray (x: 140)>
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -0.,  0., -0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
Dimensions without coordinates: x

Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?

936

asked Aug 11 '18 17:08

Liam Hanninen

1 Answers

(Answer to main question)

You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter >0 values using a "value-based" mask.

# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)

# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)

On the other hand, the reason why your attempt did not work as you hoped is because with first_image_xarray.x, it is referring to the index of elements in the array (in x direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be nan instead of 0 because it only does not suffice the mask condition in slice [:,0,0]. Yes, you were creating an "index-based" mask.

The following small experiment (hopefully) articulates this critical difference.

Suppose we have DataArray which consists of only 0 and 1 (dimension is aligned with the original post (OP) of the question (140,140,96)). Firstly let's mask it based on index as OP did:

import numpy as np
import xarray as xr

np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))


# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]

Out:
<xarray.DataArray (x: 140)>
array([ nan,   0.,   1.,   1.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,
         0.,   1.,   0.,   1.,   0.,   1.,   0.,   0.,   0.,   1.,   0.,   0.,
         1.,   1.,   0.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
         1.,   1.,   0.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   0.,   1.,
         1.,   0.,   0.,   0.,   1.,   1.,   1.,   0.,   0.,   1.,   0.,   0.,
         1.,   0.,   1.,   1.,   0.,   0.,   1.,   0.,   0.,   1.,   1.,   1.,
         0.,   0.,   0.,   1.,   1.,   0.,   1.,   0.,   1.,   1.,   0.,   0.,
         0.,   0.,   1.,   1.,   0.,   1.,   1.,   1.,   1.,   0.,   1.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   1.,   0.,   1.,   1.,   0.,   0.,
         0.,   0.,   1.,   0.,   1.,   0.,   0.,   0.,   0.,   1.,   0.,   1.,
         0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,   1.,   1.,   0.,   0.,
         0.,   1.,   0.,   0.,   1.,   0.,   0.,   1.])
Dimensions without coordinates: x

With the mask above, only the element where index of both x and y are 0 turns in to nan and the rest has not been changed or dropped at all.

In contrast, the proposed solution masks the DataArray based on the values of DataArray elements.

# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)

Out:
<xarray.DataArray (x: 65)>
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
Dimensions without coordinates: x

This successfully dropped all the values which do not suffice a mask condition based on the values of DataArray elements.

(Answer to side question)

As for the origin of -0 and 0 in DataArray, rounded values from negative or positive side towards 0 would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case.

import numpy as np
import xarray as xr

xr_array = xr.DataArray([-0.1, 0.1])

# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray

xr.DataArray.round(xr_array)

Out:
<xarray.DataArray (dim_0: 2)>
array([-0.,  0.])
Dimensions without coordinates: dim_0

np.round(xr_array)

Out:
<xarray.DataArray (dim_0: 2)>
array([-0.,  0.])
Dimensions without coordinates: dim_0

As a side note, the other possibility for getting -0 in NumPy array can be numpy.set_printoptions(precision=0), which hides below decimal point like below (but I know this is not the case this time since you are using DataArray):

import numpy as np

# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)

np.array([-0.1, 0.1])

Out:
array([-0.,  0.])

Anyway, My best guess is that the conversion to -0 should be manual and intentional rather than automatic in data preparation & pre-processing phase.

Hope this helps.

answered Sep 20 '22 15:09

gyoza

Related questions
                            
                                Import binary package from different directory
                            
                                How is the usage of @classmethod causing difference in outputs?
                            
                                Why is eval('"\x27"') == eval('"\\x27"')?
                            
                                python template with default value
                            
                                Best way to process a click stream to create features in Pandas
                            
                                How does Python know two string variables point to the same object? [duplicate]
                            
                                MongoEngine - Another user is already authenticated to this database. You must logout first
                            
                                pandas: Composition for chained methods like .resample(), .rolling() etc
                            
                                When running __main__.py, get current module
                            
                                Any idea to optimise this algorithm?
                            
                                Django logging during migration
                            
                                AWS Lambda Policy Length Exceeded - adding rules to a lambda function
                            
                                Python PDF read straight across as how it looks in the PDF
                            
                                How to instantiate a Google API service using google-auth?
                            
                                What does X_set[y_set == j, 0] mean?
                            
                                Setting Icon for PyInstaller Application
                            
                                Predicted values of each fold in K-Fold Cross Validation in sklearn
                            
                                Why is that slicing expression generating that output [duplicate]
                            
                                Keep x/y axes the same lengths in seaborn/matplotlib
                            
                                non-uniform spacing with numpy.gradient

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With