xarray reverse interpolation (on coordinate, not on data)

Question

I have a the following DataArray

arr = xr.DataArray([[0.33, 0.25],[0.55, 0.60],[0.85, 0.71],[0.92,0.85],[1.50,0.96],[2.5,1.1]],[('x',[0.25,0.5,0.75,1.0,1.25,1.5]),('y',[1,2])])

This gives the following output

<xarray.DataArray (x: 6, y: 2)>
array([[0.33, 0.25],
       [0.55, 0.6 ],
       [0.85, 0.71],
       [0.92, 0.85],
       [1.5 , 0.96],
       [2.5 , 1.1 ]])
Coordinates:
  * x        (x) float64 0.25 0.5 0.75 1.0 1.25 1.5
  * y        (y) int32 1 2

or sorted below with x and output (z) next to each other for convenience.

x         z (y=1)   z(y=2)
0.25      0.33      0.25
0.50      0.55      0.60
0.75      0.85      0.71
1.00      0.92      0.85
1.25      1.50      0.96
1.50      2.50      1.10

The data I have is the result of several input values. One of them is the x value. There are several other dimensions (such as y) for other input values. I want to know when my output value (z) is growing larger than 1.00, keeping the other dimensions fixed and vary the x-value. In the above 2-dimensional example, I would like to get the answer [1.03 1.32]. Because a value of 1.03 for x will give me 1.00 for z when y=1 and a value of 1.32 for x will give me 1.00 for z when y=2.

edit: Since the output z will grow with increasing x, there is only one point where z will have 1.0 as an output.

Is there any efficient way to achieve this with xarray? My actual table is much larger and has 4 inputs (dimensions).

Thank you for any help!

jojo · Accepted Answer

xarray has a very handy function for this: xr.interp which will do a piecewise linear interpolation of an xarray.

In your case, you can use it to obtain a piecewise interpolation of the (x, y1) and (x, y1) points. Once this is done, the only thing that remains to do is getting the value of your interpolated x array that is associated to the closes value of your interpolated y1/y2/.. array to the target number (1.00 in your example).

Here is how this could look like:

y_dims = [0, 1,] 
target_value = 1.0
# create a 'high resolution` version of your data array:
arr_itp = arr.interp(x=np.linspace(arr.x.min(), arr.x.max(), 10000))
for y in y_dims:
    # get the index of closest data
    x_closest = np.abs(arr_itp.isel(y=y) - target_value).argmin()
    print(arr_itp.isel(y=y, x=x_closest))

>>> <xarray.DataArray ()>
>>> array(0.99993199)
>>> Coordinates:
>>>     y        int64 1
>>>     x        float64 1.034
>>> <xarray.DataArray ()>
>>> array(1.00003)
>>> Coordinates:
>>>     y        int64 2
>>>     x        float64 1.321

While this works, it is not a really efficient way to approach the problem and here are 2 reasons why not:

Using xr.interp does a piecewise interpolation of the entire DataArray. Howerver, we only ever need the interpolation between the two points closest to your target value.
Here, an interpolation is a straight line between 2 points. But if we know one coordinate of a point on that line (y=1.00) then we can simply calculate the other coordinate by resolving the linear equation of the straight line and the problem is solved in a few arithmetic operations.

Taking these reasons into account we can develop a more efficient solution to your problem:

# solution of linear function between two points (2. reason)
def lin_itp(p1,p2,tv):
    """Get x coord of point on line

    Determine the x coord. of a point (x, target_value) on the line
    through the points p1, p2.

    Approach:
      - parametrize x, y between p1 and p2: 
          x = p1[0] + t*(p2[0]-p1[0])
          y = p1[1] + t*(p2[1]-p1[1])
      - set y = tv and resolve 2nd eqt for t
          t = (tv - p1[1]) / (p2[1] - p1[1])
      - replace t in 1st eqt with solution for t
          x = p1[0] + (tv - p1[1])*(p2[0] - p1[0])/(p2[1] - p1[1])
    """
    return float(p1[0] + (tv - p1[1])*(p2[0] - p1[0])/(p2[1] - p1[1])) 

# target value:
t_v = 1.0
for y in [0, 1]:
    arr_sd = arr.isel(y=y)
    # get index for the value closest to the target value (but smaller)
    s_udim = int(xr.where(arr_sd - t_v <=0, arr_sd, arr_sd.min()).argmax())
    # I'm explicitly defining the two points here
    ps_itp = arr_sd[s_udim:s_udim+2]
    p1, p2 = (ps_itp.x[0], ps_itp[0]), (ps_itp.x[1], ps_itp[1])
    print(lin_itp(p1,p2,t_v))

>>> 1.0344827586206897
>>> 1.3214285714285714

Hoogendijk · Answer

The problem I had with jojo's answer is that it is difficult to expand it in many dimensions and to keep the xarray structure. Hence, I decided to look further into this. I used some ideas from jojo's code to make below answer.

I make two arrays, one with the condition that the values are smaller than what I look for, and one with the condition they need to be larger. I shift the second one in the x-direction by minus 1. Now I combine them in a normal linear interpolation formula. The two arrays only have values overlapping at the 'edge' of the condition. If not shifted by -1, no values would overlap. In the final line I sum over the x-direction and since al other values are NaN, I extract the correct value and remove the x-direction from the DataArray in the process.

def interpolate_dimension_x(arr, target_value, step):
    M0 = arr.where(arr - target_value <= 0)
    M1 = arr.where(arr - target_value > 0).shift(x=-1)

    work_mat = M0.x + step * (target_value - M0) / (M1 - M0)

    return work_mat.sum(dim='x')

interpolate_dimension_x(arr, 1, 0.25)

>>> <xarray.DataArray (y: 2)>
array([1.034483, 1.321429])
Coordinates:
  * y        (y) int32 1 2

I have some drawbacks with my code. The code only works if M0 and M1 find a value that fulfills the condition. Otherwise all values in that row will be set to NaN. To prevent issues with M0, I decided to just have the x-values start at 0 since my target value is always larger than 0. To prevent issues with M1, I choose my values of x large enough so I know my values are in there. Naturally, these are not ideal solutions and may break the code. If I get a bit more experience with xarray and python I might rewrite. In summary I have the following items I would like to solve:

How to extrapolate values outside the x-range? I'm currently just ensuring my x-range is large enough that the answers falls within it.
How to make the code robust for a variable stepsize?
How to make the code so that my dimension can be chosen dynamically (now it only works for 'x')
Any optimizations are appreciated.

xarray reverse interpolation (on coordinate, not on data)

Tags:

python

interpolation

python-xarray

Hoogendijk

2 Answers

jojo

Hoogendijk

Recent Activity

Donate For Us

xarray reverse interpolation (on coordinate, not on data)

Tags:

python

interpolation

python-xarray

Hoogendijk

2 Answers

jojo

Hoogendijk

Related questions

Recent Activity

Donate For Us