Using numpy.vstack in numba

Tags:

So I have been trying to optimize some code that calculates a statistical error metric from some array data. The metric is called the Continuous Rank Probability Score (CRPS).

I have been using Numba to try to speed up the double for loop required in this calculation, but I have been having an issue with the numpy.vstack function. From what I understand from the docs here, the vstack() function should be supported, but when I run the following code I get an error.

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):
    """Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in
    Hersbach et al. (2000)

    Parameters
    ----------
    obs: 1D ndarry
        Array of observations for each start date
    fcst_ens: 2D ndarray
        Array of ensemble forecast of dimension n x M, where n = number of start dates and
        M = number of ensemble members.

    remove_neg: bool
        If True, when a negative value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    remove_zero: bool
        If true, when a zero value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    Returns
    -------
    dict
        Dictionary contains a number of *experimental* outputs including:
            - ["crps"] 1D ndarray of crps values per n start dates.
            - ["crpsMean1"] arithmetic mean of crps values.
            - ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).

    Notes
    -----
    **NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the
    corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.

    References
    ----------
    - Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score
      for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.
    """
    # Treating the Data
    obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)

    # Set parameters
    n = fcst_ens.shape[0]  # number of forecast start dates
    m = fcst_ens.shape[1]  # number of ensemble members

    # Create vector of pi's
    p = np.linspace(0, m, m + 1)
    pi = p / m

    crps_numba = np.zeros(n)

    @njit
    def calculate_crps():
        # Loop fcst start times
        for i in prange(n):

            # Initialise vectors for storing output
            a = np.zeros(m - 1)
            b = np.zeros(m - 1)

            # Verifying analysis (or obs)
            xa = obs[i]

            # Ensemble fcst CDF
            x = np.sort(fcst_ens[i, :])

            # Deal with 0 < i < m [So, will loop 50 times for m = 51]
            for j in prange(m - 1):

                # Rule 1
                if xa > x[j + 1]:
                    a[j] = x[j + 1] - x[j]
                    b[j] = 0

                # Rule 2
                if x[j] < xa < x[j + 1]:
                    a[j] = xa - x[j]
                    b[j] = x[j + 1] - xa

                # Rule 3
                if xa < x[j]:
                    a[j] = 0
                    b[j] = x[j + 1] - x[j]

            # Deal with outliers for i = 0, and i = m,
            # else a & b are 0 for non-outliers
            if xa < x[0]:
                a1 = 0
                b1 = x[0] - xa
            else:
                a1 = 0
                b1 = 0

            # Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)
            if xa > x[m - 1]:
                am = xa - x[m - 1]
                bm = 0
            else:
                am = 0
                bm = 0

            # Combine full a & b vectors including outlier
            a = np.concatenate((np.array([0]), a, np.array([am])))
            # a = np.insert(a, 0, a1)
            # a = np.append(a, am)
            a = np.concatenate((np.array([0]), a, np.array([bm])))
            # b = np.insert(b, 0, b1)
            # b = np.append(b, bm)

            # Populate a_mat and b_mat
            if i == 0:
                a_mat = a
                b_mat = b
            else:
                a_mat = np.vstack((a_mat, a))
                b_mat = np.vstack((b_mat, b))

            # Calc crps for individual start times
            crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()

        return crps_numba, a_mat, b_mat

    crps, a_mat, b_mat = calculate_crps()
    print(crps)
    # Calc mean crps as simple mean across crps[i]
    crps_mean_method1 = np.mean(crps)

    # Calc mean crps across all start times from eqn. 28 in Hersbach (2000)
    abar = np.mean(a_mat, 0)
    bbar = np.mean(b_mat, 0)
    crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()

    # Output array as a dictionary
    output = {'crps': crps, 'crpsMean1': crps_mean_method1,
              'crpsMean2': crps_mean_method2}

    return output

The error I get is this:

Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path

File "test.py", line 86:
    def calculate_crps():
        <source elided>
            if i == 0:
                a_mat = a
                ^

[1] During: typing of assignment at *path

File "test.py", line 89:
    def calculate_crps():
        <source elided>
            else:
                a_mat = np.vstack((a_mat, a))
                ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

431

asked Aug 08 '18 19:08

pythonweb

1 Answers

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

TL;DR: It's not vstack that is the problem. The problem is that you have code-paths that try to assign different types of arrays to the same variable (which throws that unification exception).

The problem lies here:

# Populate a_mat and b_mat
if i == 0:
    a_mat = a
    b_mat = b
else:
    a_mat = np.vstack((a_mat, a))
    b_mat = np.vstack((b_mat, b))

In the first code-path you assign a 1d c-contigous float64 array to a_mat and b_mat and in the else it's a 2d c-contiguous float64 array. These types are not compatible and thus numba throws an error. It's sometimes tricky that numba code doesn't work like Python code, where it doesn't matter what types you have when you assign something to a variable. However in recent releases the numba exception messages got a lot better so if you know what the exception hints at you can normally quickly find out what's the problem.

Longer Explanation

The problem is that numba infers the types of your variables implicitly. For example:

from numba import njit

@njit
def func(arr):
    a = arr
    return a

Here I haven't typed the function so I need to run it once:

>>> import numpy as np
>>> func(np.zeros(5))
array([0., 0., 0., 0., 0.])

Then you can inspect the types:

>>> func.inspect_types()

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-4-02470248b065>
# --- LINE 3 --- 
# label 0

@njit

# --- LINE 4 --- 

def func(arr):

    # --- LINE 5 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   a = arr  :: array(float64, 1d, C)
    #   del arr

    a = arr

    # --- LINE 6 --- 
    #   $0.3 = cast(value=a)  :: array(float64, 1d, C)
    #   del a
    #   return $0.3

    return a

As you can see the variable a is typed for an input of type array(float64, 1d, C) as array(float64, 1d, C).

Now, let's use np.vstack instead:

from numba import njit
import numpy as np

@njit
def func(arr):
    a = np.vstack((arr, arr))
    return a

And the mandatory first call to compile it:

>>> func(np.zeros(5))
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Then inspect the types again:

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-11-f0214d5181c6>
# --- LINE 4 --- 
# label 0

@njit

# --- LINE 5 --- 

def func(arr):

    # --- LINE 6 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   $0.1 = global(np: <module 'numpy'>)  :: Module(<module 'numpy'>)
    #   $0.2 = getattr(value=$0.1, attr=vstack)  :: Function(<function vstack at 0x000001DB7082A400>)
    #   del $0.1
    #   $0.5 = build_tuple(items=[Var(arr, <ipython-input-11-f0214d5181c6> (6)), Var(arr, <ipython-input-11-f0214d5181c6> (6))])  :: tuple(array(float64, 1d, C) x 2)
    #   del arr
    #   $0.6 = call $0.2($0.5, func=$0.2, args=[Var($0.5, <ipython-input-11-f0214d5181c6> (6))], kws=(), vararg=None)  :: (tuple(array(float64, 1d, C) x 2),) -> array(float64, 2d, C)
    #   del $0.5
    #   del $0.2
    #   a = $0.6  :: array(float64, 2d, C)
    #   del $0.6

    a = np.vstack((arr, arr))

    # --- LINE 7 --- 
    #   $0.8 = cast(value=a)  :: array(float64, 2d, C)
    #   del a
    #   return $0.8

    return a

This time a is typed as array(float64, 2d, C) for an input of array(float64, 1d, C).

You may have asked yourself why I'm talking about that. Let's see what happens if you try to conditionally assign to a:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = arr
    return a

If you now run the code:

>>> func(np.zeros(5), True)

TypingError: Failed at nopython (nopython frontend)
Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'a', defined at <ipython-input-16-f4bd9a4f377a> (7)

File "<ipython-input-16-f4bd9a4f377a>", line 7:
def func(arr, condition):
    <source elided>
    if condition:
        a = np.vstack((arr, arr))
        ^

[1] During: typing of assignment at <ipython-input-16-f4bd9a4f377a> (9)

File "<ipython-input-16-f4bd9a4f377a>", line 9:
def func(arr, condition):
    <source elided>
    else:
        a = arr
        ^

That's exactly the problem you have and it's because variables need to have one and only one type in numba for a fixed set of input types. And because the dtype, the rank (number of dimensions), and the contiguous property are all part of the type you cannot assign arrays with different dimensions to the same variable.

Note that you could expand the dimensions to make it works and squeeze the unnecessary dimensions again from the result (probably not very nice but it should solve the problem with minimal amount of "changes":

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = np.expand_dims(arr, 0)
    return a

>>> func(np.zeros(5), False)
array([[0., 0., 0., 0., 0.]])  # <-- 2d array!
>>> func(np.zeros(5), True)
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

175

answered Oct 13 '22 21:10

MSeifert

Related questions
                            
                                What's the inferred name of variables in argparse in conflicting cases
                            
                                How to set the timeout of 'driver.get' for python selenium 3.8.0?
                            
                                Seaborn heatmap, custom tick values
                            
                                Round to nearest 1000 in pandas
                            
                                Pandas, how to combine multiple columns into an array column
                            
                                Django '/' only homepage url error
                            
                                Making numpy arrays JSON serializable
                            
                                opposite of df.diff() in pandas
                            
                                What does x in range(...) == y mean in Python 3? [duplicate]
                            
                                Django's template tag inside javascript
                            
                                Unit test pyspark code using python
                            
                                Python: Normalize image exposure
                            
                                Keep a column with a categorical variable in Pandas with groupby and mean()
                            
                                Flask ImportError: cannot import name app
                            
                                How to apply a condition to pandas iloc
                            
                                Why Doc2vec gives 2 different vectors for the same texts
                            
                                Start CloudSQL Proxy on Python Dataflow / Apache Beam
                            
                                Custom weight initialization in PyTorch
                            
                                Individual axes limits for pairplot in python
                            
                                Prune unnecessary leaves in sklearn DecisionTreeClassifier

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using numpy.vstack in numba

Tags:

python

optimization

python-3.x

numpy

numba

pythonweb

People also ask

1 Answers

Longer Explanation

MSeifert

Recent Activity

Donate For Us