Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge xArray datasets with conflicting coordinates

Let's say I have two data sets, each containing a different variable of interest and with incomplete (but not conflicting) indices:

In [1]: import xarray as xr, numpy as np
In [2]: ages = xr.Dataset(
          {'ages': (['kid_ids'], np.random.rand((3))*20)}, 
          coords={'kid_names':(['kid_ids'], ['carl','kathy','gail']), 'kid_ids': [10,14,16]})
In [3]: heights = xr.Dataset(
          {'heights': (['kid_ids'], np.random.rand((3))*160)}, 
          coords={'kid_names':(['kid_ids'], ['carl','keith','gail']), 'kid_ids': [10,13,16]})

This creates two data sets that seem like they should merge well:

In [4]: ages
Out[4]: 
<xarray.Dataset>
Dimensions:    (kid_ids: 3)
Coordinates:
  * kid_ids    (kid_ids) int32 10 14 16
    kid_names  (kid_ids) <U5 'carl' 'kathy' 'gail'
Data variables:
    ages       (kid_ids) float64 13.28 1.955 4.327
In [5]: heights
Out[5]: 
<xarray.Dataset>
Dimensions:    (kid_ids: 3)
Coordinates:
  * kid_ids    (kid_ids) int32 10 13 16
    kid_names  (kid_ids) <U5 'carl' 'keith' 'gail'
Data variables:
    heights    (kid_ids) float64 115.0 38.2 31.65

but they don't - attempting ages.merge(heights) causes a ValueError:

ValueError: conflicting value for variable kid_names:
first value: <xarray.Variable (kid_ids: 4)>
array(['carl', nan, 'kathy', 'gail'], dtype=object)
second value: <xarray.Variable (kid_ids: 4)>
array(['carl', 'keith', nan, 'gail'], dtype=object)

dropping the coordinate kid_names solves the problem:

In [7]: ages.reset_coords('kid_names', drop=True).merge(
          heights.reset_coords('kid_names', drop=True))
Out[7]:
<xarray.Dataset>
Dimensions:  (kid_ids: 4)
Coordinates:
  * kid_ids  (kid_ids) int64 10 13 14 16
Data variables:
    ages     (kid_ids) float64 0.4473 nan 6.45 6.787
    heights  (kid_ids) float64 78.42 78.43 nan 113.4

It seems as though the coordinates are being handled like DataArrays, in that any non-identical values raise an error. But shouldn't they be handled more like the base coordinates, e.g. extended to the superset of the two indices? Or is there another operation I should be doing?

I'm on python 3.5 using xarray 0.7.2 and numpy 1.10.4

like image 629
Michael Delgado Avatar asked Apr 20 '16 00:04

Michael Delgado


1 Answers

This isn't currently easy to achieve in xarray, but it should be!

In fact, I think it should be safe to merge any non-conflicting values under most circumstances (unless the user requests higher scrutiny).

I opened a GitHub issue to track this: https://github.com/pydata/xarray/issues/835

Update: the merge method now supports this by default (with compat='no_conflicts'), so ages.merge(heights) should just work.

like image 166
shoyer Avatar answered Oct 01 '22 06:10

shoyer