Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xarray reverse interpolation (on coordinate, not on data)

I have a the following DataArray

arr = xr.DataArray([[0.33, 0.25],[0.55, 0.60],[0.85, 0.71],[0.92,0.85],[1.50,0.96],[2.5,1.1]],[('x',[0.25,0.5,0.75,1.0,1.25,1.5]),('y',[1,2])])

This gives the following output

<xarray.DataArray (x: 6, y: 2)>
array([[0.33, 0.25],
       [0.55, 0.6 ],
       [0.85, 0.71],
       [0.92, 0.85],
       [1.5 , 0.96],
       [2.5 , 1.1 ]])
Coordinates:
  * x        (x) float64 0.25 0.5 0.75 1.0 1.25 1.5
  * y        (y) int32 1 2

or sorted below with x and output (z) next to each other for convenience.

x         z (y=1)   z(y=2)
0.25      0.33      0.25
0.50      0.55      0.60
0.75      0.85      0.71
1.00      0.92      0.85
1.25      1.50      0.96
1.50      2.50      1.10

The data I have is the result of several input values. One of them is the x value. There are several other dimensions (such as y) for other input values. I want to know when my output value (z) is growing larger than 1.00, keeping the other dimensions fixed and vary the x-value. In the above 2-dimensional example, I would like to get the answer [1.03 1.32]. Because a value of 1.03 for x will give me 1.00 for z when y=1 and a value of 1.32 for x will give me 1.00 for z when y=2.

edit: Since the output z will grow with increasing x, there is only one point where z will have 1.0 as an output.

Is there any efficient way to achieve this with xarray? My actual table is much larger and has 4 inputs (dimensions).

Thank you for any help!

like image 348
Hoogendijk Avatar asked Apr 15 '20 21:04

Hoogendijk


2 Answers

xarray has a very handy function for this: xr.interp which will do a piecewise linear interpolation of an xarray.

In your case, you can use it to obtain a piecewise interpolation of the (x, y1) and (x, y1) points. Once this is done, the only thing that remains to do is getting the value of your interpolated x array that is associated to the closes value of your interpolated y1/y2/.. array to the target number (1.00 in your example).

Here is how this could look like:

y_dims = [0, 1,] 
target_value = 1.0
# create a 'high resolution` version of your data array:
arr_itp = arr.interp(x=np.linspace(arr.x.min(), arr.x.max(), 10000))
for y in y_dims:
    # get the index of closest data
    x_closest = np.abs(arr_itp.isel(y=y) - target_value).argmin()
    print(arr_itp.isel(y=y, x=x_closest))

>>> <xarray.DataArray ()>
>>> array(0.99993199)
>>> Coordinates:
>>>     y        int64 1
>>>     x        float64 1.034
>>> <xarray.DataArray ()>
>>> array(1.00003)
>>> Coordinates:
>>>     y        int64 2
>>>     x        float64 1.321



While this works, it is not a really efficient way to approach the problem and here are 2 reasons why not:

  1. Using xr.interp does a piecewise interpolation of the entire DataArray. Howerver, we only ever need the interpolation between the two points closest to your target value.
  2. Here, an interpolation is a straight line between 2 points. But if we know one coordinate of a point on that line (y=1.00) then we can simply calculate the other coordinate by resolving the linear equation of the straight line and the problem is solved in a few arithmetic operations.

Taking these reasons into account we can develop a more efficient solution to your problem:

# solution of linear function between two points (2. reason)
def lin_itp(p1,p2,tv):
    """Get x coord of point on line

    Determine the x coord. of a point (x, target_value) on the line
    through the points p1, p2.

    Approach:
      - parametrize x, y between p1 and p2: 
          x = p1[0] + t*(p2[0]-p1[0])
          y = p1[1] + t*(p2[1]-p1[1])
      - set y = tv and resolve 2nd eqt for t
          t = (tv - p1[1]) / (p2[1] - p1[1])
      - replace t in 1st eqt with solution for t
          x = p1[0] + (tv - p1[1])*(p2[0] - p1[0])/(p2[1] - p1[1])
    """
    return float(p1[0] + (tv - p1[1])*(p2[0] - p1[0])/(p2[1] - p1[1])) 

# target value:
t_v = 1.0
for y in [0, 1]:
    arr_sd = arr.isel(y=y)
    # get index for the value closest to the target value (but smaller)
    s_udim = int(xr.where(arr_sd - t_v <=0, arr_sd, arr_sd.min()).argmax())
    # I'm explicitly defining the two points here
    ps_itp = arr_sd[s_udim:s_udim+2]
    p1, p2 = (ps_itp.x[0], ps_itp[0]), (ps_itp.x[1], ps_itp[1])
    print(lin_itp(p1,p2,t_v))

>>> 1.0344827586206897
>>> 1.3214285714285714


like image 150
jojo Avatar answered Nov 02 '22 00:11

jojo


The problem I had with jojo's answer is that it is difficult to expand it in many dimensions and to keep the xarray structure. Hence, I decided to look further into this. I used some ideas from jojo's code to make below answer.

I make two arrays, one with the condition that the values are smaller than what I look for, and one with the condition they need to be larger. I shift the second one in the x-direction by minus 1. Now I combine them in a normal linear interpolation formula. The two arrays only have values overlapping at the 'edge' of the condition. If not shifted by -1, no values would overlap. In the final line I sum over the x-direction and since al other values are NaN, I extract the correct value and remove the x-direction from the DataArray in the process.

def interpolate_dimension_x(arr, target_value, step):
    M0 = arr.where(arr - target_value <= 0)
    M1 = arr.where(arr - target_value > 0).shift(x=-1)

    work_mat = M0.x + step * (target_value - M0) / (M1 - M0)

    return work_mat.sum(dim='x')
interpolate_dimension_x(arr, 1, 0.25)

>>> <xarray.DataArray (y: 2)>
array([1.034483, 1.321429])
Coordinates:
  * y        (y) int32 1 2

I have some drawbacks with my code. The code only works if M0 and M1 find a value that fulfills the condition. Otherwise all values in that row will be set to NaN. To prevent issues with M0, I decided to just have the x-values start at 0 since my target value is always larger than 0. To prevent issues with M1, I choose my values of x large enough so I know my values are in there. Naturally, these are not ideal solutions and may break the code. If I get a bit more experience with xarray and python I might rewrite. In summary I have the following items I would like to solve:

  • How to extrapolate values outside the x-range? I'm currently just ensuring my x-range is large enough that the answers falls within it.
  • How to make the code robust for a variable stepsize?
  • How to make the code so that my dimension can be chosen dynamically (now it only works for 'x')
  • Any optimizations are appreciated.
like image 26
Hoogendijk Avatar answered Nov 02 '22 01:11

Hoogendijk