I have a number of series in a pandas dataframe representing rates observed yearly.
For an experiment, I want some of these series' rates to converge towards one of the other series' rate in the last observed year.
For example, say I have this data, and I decide column a
is a meaningful target for column b
to approach asymptotically over, say, a ten year period in small, even sized increments (or decreasing; doesn't really matter).
I could of course do this in a loop, but I was wondering if there was a more general numpy
or scipy
vectorized way of making one series approach another asymptotically off the shelf.
rate a b
year
2006 0.393620 0.260998
2007 0.408620 0.260527
2008 0.396732 0.257396
2009 0.418029 0.249123
2010 0.414246 0.253526
2011 0.415873 0.256586
2012 0.414616 0.253865
2013 0.408332 0.257504
2014 0.401821 0.259208
What is the difference between NumPy and SciPy? In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, etc. All numerical code would reside in SciPy.
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.
The numpy.tile() function constructs a new array by repeating array – 'arr', the number of times we want to repeat as per repetitions. The resulted array will have dimensions max(arr.ndim, repetitions) where, repetitions is the length of repetitions.
diff. Calculate the n-th discrete difference along the given axis. The first difference is given by out[i] = a[i+1] - a[i] along the given axis, higher differences are calculated by using diff recursively.
Generally speaking, you'd apply an "easing function" over some range.
For example, consider the figure below:
Here, we have two original datasets. We'll subtract the two, multiply the difference by the easing function shown in the third row, and then add the result back to the first curve. This will result in a new series that is the original data to the left of the gray region, a blend of the two within the gray region, and data from the other curve to the right of the gray region.
As an example:
import numpy as np
import matplotlib.pyplot as plt
# Generate some interesting random data
np.random.seed(1)
series1 = np.random.normal(0, 1, 1000).cumsum() + 20
series2 = np.random.normal(0, 1, 1000).cumsum()
# Our x-coordinates
index = np.arange(series1.size)
# Boundaries of the gray "easing region"
i0, i1 = 300, 700
# In this case, I've chosen a sinusoidal easing function...
x = np.pi * (index - i0) / (i1 - i0)
easing = 0.5 * np.cos(x) + 0.5
# To the left of the gray region, easing should be 1 (all series2)
easing[index < i0] = 1
# To the right, it should be 0 (all series1)
easing[index >= i1] = 0
# Now let's calculate the new series that will slowly approach the first
# We'll operate on the difference and then add series1 back in
diff = series2 - series1
series3 = easing * diff + series1
Also, if you're curious about the plot above, here's how it's generated:
fig, axes = plt.subplots(nrows=4, sharex=True)
axes[0].plot(series1, color='lightblue', lw=2)
axes[0].plot(series2, color='salmon', lw=1.5)
axes[0].set(ylabel='Original Series')
axes[1].plot(diff, color='gray')
axes[1].set(ylabel='Difference')
axes[2].plot(easing, color='black', lw=2)
axes[2].margins(y=0.1)
axes[2].set(ylabel='Easing')
axes[3].plot(series1, color='lightblue', lw=2)
axes[3].plot(series3, color='salmon', ls='--', lw=2, dashes=(12,20))
axes[3].set(ylabel='Modified Series')
for ax in axes:
ax.locator_params(axis='y', nbins=4)
for ax in axes[-2:]:
ax.axvspan(i0, i1, color='0.8', alpha=0.5)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With