Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct way to implement piecewise function in pandas / numpy

I need to create a function to pass to curve_fit. In my case, the function is best defined as a piecewise function.

I know that the following doesn't work, but I'm showing it since it makes the intent of the function clear:

def model_a(X, x1, x2, m1, b1, m2, b2):
    '''f(x) has form m1*x + b below x1, m2*x + b2 above x2, and is
    a cubic spline between those two points.'''
    y1 = m1 * X + b1
    y2 = m2 * X + b2
    if X <= x1:
        return y1    # function is linear below x1
    if X >= x2:
        return y2    # function is linear above x2
    # use a cubic spline to interpolate between lower
    # and upper line segment
    a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
    return cubic(X, a, b, c, d)

The problem, of course, is that X is a pandas Series, and the form (X <= x1) evaluates to a series of booleans, so this fails with the message "The truth value of a Series is ambiguous."

It appears that np.piecewise() is designed for exactly this situation: "Wherever condlist[i] is True, funclist[i](x) is used as the output value." So I tried this:

def model_b(X, x1, x2, m1, b1, m2, b2):
    def lo(x):
        return m1 * x + b1
    def hi(x):
        return m2 * x + b2
    def mid(x):
        y1 = m1 * x + b1
        y2 = m2 * x + b2
        a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
        return a * x * x * x + b * x * x + c * x + d

    return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])

But this fails at this call:

return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])

with the message "IndexError: too many indices for array". I'm inclined to think it's objecting to the fact that there are two elements in condlist and three elements in funclist, but the docs specifically state that the extra element in funclist is treated as the default.

Any guidance?

like image 839
fearless_fool Avatar asked Feb 04 '18 02:02

fearless_fool


People also ask

What is piecewise function in Python?

The piecewise() function is used to evaluate a piecewise-defined function. Given a set of conditions and corresponding functions, evaluate each function on the input data wherever its condition is true. Syntax: numpy.piecewise(x, condlist, funclist, *args, **kw) Version: 1.15.0.

Is Panda faster than NP?

NumPy performs better than Pandas for 50K rows or less. But, Pandas' performance is better than NumPy's for 500K rows or more. Thus, performance varies between 50K and 500K rows depending on the type of operation.

How do you define a piecewise function?

A piecewise function is a function built from pieces of different functions over different intervals. For example, we can make a piecewise function f(x) where f(x) = -9 when -9 < x ≤ -5, f(x) = 6 when -5 < x ≤ -1, and f(x) = -7 when -1 <x ≤ 9.

Can you use NumPy with Pandas?

Pandas is built on top of NumPy, which means the Python pandas package depends on the NumPy package and also pandas intended with many other 3rd party libraries. So we can say that Numpy is required for operating the Pandas.


1 Answers

This piece of code in NumPy's definition of np.piecewise is list/ndarray-centric:

# undocumented: single condition is promoted to a list of one condition
if isscalar(condlist) or (
        not isinstance(condlist[0], (list, ndarray)) and x.ndim != 0):
    condlist = [condlist]

Thus, if X is a Series, then condlist = [X<=x1, X>=x2] is a list of two Series. Since condlist[0] is neither a list nor an ndarray, condlist is "promoted" to a list of one condition:

condlist = [condlist]

Since this is not what we want to happen, we need to make condlist a list of NumPy arrays before passing it to np.piecewise:

X = X.values

For example,

import numpy as np
import pandas as pd
def model_b(X, x1, x2, m1, b1, m2, b2):
    def lo(x):
        return m1 * x + b1
    def hi(x):
        return m2 * x + b2
    def mid(x):
        y1 = m1 * x + b1
        y2 = m2 * x + b2
        # a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
        a, b, c, d = 1, 2, 3, 4
        return a * x * x * x + b * x * x + c * x + d
    X = X.values
    return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])

X = pd.Series(np.linspace(0, 100, 100))
x1, x2, m1, b1, m2, b2 = 30, 60, 10, 5, -20, 30
f = model_b(X, x1, x2, m1, b1, m2, b2)
like image 193
unutbu Avatar answered Sep 28 '22 05:09

unutbu