I need to create a function to pass to curve_fit
. In my case, the function is best defined as a piecewise function.
I know that the following doesn't work, but I'm showing it since it makes the intent of the function clear:
def model_a(X, x1, x2, m1, b1, m2, b2):
'''f(x) has form m1*x + b below x1, m2*x + b2 above x2, and is
a cubic spline between those two points.'''
y1 = m1 * X + b1
y2 = m2 * X + b2
if X <= x1:
return y1 # function is linear below x1
if X >= x2:
return y2 # function is linear above x2
# use a cubic spline to interpolate between lower
# and upper line segment
a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
return cubic(X, a, b, c, d)
The problem, of course, is that X is a pandas Series, and the form (X <= x1)
evaluates to a series of booleans, so this fails with the message "The truth value of a Series is ambiguous."
It appears that np.piecewise()
is designed for exactly this situation: "Wherever condlist[i] is True, funclist[i](x) is used as the output value." So I tried this:
def model_b(X, x1, x2, m1, b1, m2, b2):
def lo(x):
return m1 * x + b1
def hi(x):
return m2 * x + b2
def mid(x):
y1 = m1 * x + b1
y2 = m2 * x + b2
a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
return a * x * x * x + b * x * x + c * x + d
return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])
But this fails at this call:
return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])
with the message "IndexError: too many indices for array". I'm inclined to think it's objecting to the fact that there are two elements in condlist and three elements in funclist, but the docs specifically state that the extra element in funclist is treated as the default.
Any guidance?
The piecewise() function is used to evaluate a piecewise-defined function. Given a set of conditions and corresponding functions, evaluate each function on the input data wherever its condition is true. Syntax: numpy.piecewise(x, condlist, funclist, *args, **kw) Version: 1.15.0.
NumPy performs better than Pandas for 50K rows or less. But, Pandas' performance is better than NumPy's for 500K rows or more. Thus, performance varies between 50K and 500K rows depending on the type of operation.
A piecewise function is a function built from pieces of different functions over different intervals. For example, we can make a piecewise function f(x) where f(x) = -9 when -9 < x ≤ -5, f(x) = 6 when -5 < x ≤ -1, and f(x) = -7 when -1 <x ≤ 9.
Pandas is built on top of NumPy, which means the Python pandas package depends on the NumPy package and also pandas intended with many other 3rd party libraries. So we can say that Numpy is required for operating the Pandas.
This piece of code in NumPy's definition of np.piecewise
is list
/ndarray
-centric:
# undocumented: single condition is promoted to a list of one condition
if isscalar(condlist) or (
not isinstance(condlist[0], (list, ndarray)) and x.ndim != 0):
condlist = [condlist]
Thus, if X
is a Series, then condlist = [X<=x1, X>=x2]
is a list of two Series
.
Since condlist[0]
is neither a list
nor an ndarray
, condlist
is "promoted" to a list of one condition:
condlist = [condlist]
Since this is not what we want to happen, we need to make condlist
a list of NumPy arrays before passing it to np.piecewise
:
X = X.values
For example,
import numpy as np
import pandas as pd
def model_b(X, x1, x2, m1, b1, m2, b2):
def lo(x):
return m1 * x + b1
def hi(x):
return m2 * x + b2
def mid(x):
y1 = m1 * x + b1
y2 = m2 * x + b2
# a, b, c, d = fit_cubic(x1, y1, x2, y2, m1, m2)
a, b, c, d = 1, 2, 3, 4
return a * x * x * x + b * x * x + c * x + d
X = X.values
return np.piecewise(X, [X<=x1, X>=x2], [lo, hi, mid])
X = pd.Series(np.linspace(0, 100, 100))
x1, x2, m1, b1, m2, b2 = 30, 60, 10, 5, -20, 30
f = model_b(X, x1, x2, m1, b1, m2, b2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With