Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy / scipy: Making one series converge towards another after a period of time

I have a number of series in a pandas dataframe representing rates observed yearly.

For an experiment, I want some of these series' rates to converge towards one of the other series' rate in the last observed year.

For example, say I have this data, and I decide column a is a meaningful target for column b to approach asymptotically over, say, a ten year period in small, even sized increments (or decreasing; doesn't really matter).

I could of course do this in a loop, but I was wondering if there was a more general numpy or scipy vectorized way of making one series approach another asymptotically off the shelf.

rate               a         b                  
year                                                                       
2006               0.393620  0.260998          
2007               0.408620  0.260527
2008               0.396732  0.257396 
2009               0.418029  0.249123 
2010               0.414246  0.253526  
2011               0.415873  0.256586  
2012               0.414616  0.253865     
2013               0.408332  0.257504    
2014               0.401821  0.259208  
like image 805
ako Avatar asked Dec 07 '15 19:12

ako


People also ask

What is the difference between NumPy and SciPy?

What is the difference between NumPy and SciPy? In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, etc. All numerical code would reside in SciPy.

What is NumPy vectorization?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.

What is NP tile ()?

The numpy.tile() function constructs a new array by repeating array – 'arr', the number of times we want to repeat as per repetitions. The resulted array will have dimensions max(arr.ndim, repetitions) where, repetitions is the length of repetitions.

What does NumPy diff do?

diff. Calculate the n-th discrete difference along the given axis. The first difference is given by out[i] = a[i+1] - a[i] along the given axis, higher differences are calculated by using diff recursively.


1 Answers

Generally speaking, you'd apply an "easing function" over some range.

For example, consider the figure below:

enter image description here

Here, we have two original datasets. We'll subtract the two, multiply the difference by the easing function shown in the third row, and then add the result back to the first curve. This will result in a new series that is the original data to the left of the gray region, a blend of the two within the gray region, and data from the other curve to the right of the gray region.

As an example:

import numpy as np
import matplotlib.pyplot as plt

# Generate some interesting random data
np.random.seed(1)
series1 = np.random.normal(0, 1, 1000).cumsum() + 20
series2 = np.random.normal(0, 1, 1000).cumsum()
# Our x-coordinates
index = np.arange(series1.size)

# Boundaries of the gray "easing region"
i0, i1 = 300, 700    

# In this case, I've chosen a sinusoidal easing function...
x = np.pi * (index - i0) / (i1 - i0)
easing = 0.5 * np.cos(x) + 0.5

# To the left of the gray region, easing should be 1 (all series2)
easing[index < i0] = 1

# To the right, it should be 0 (all series1)
easing[index >= i1] = 0

# Now let's calculate the new series that will slowly approach the first
# We'll operate on the difference and then add series1 back in 
diff = series2 - series1
series3 = easing * diff + series1

Also, if you're curious about the plot above, here's how it's generated:

fig, axes = plt.subplots(nrows=4, sharex=True)

axes[0].plot(series1, color='lightblue', lw=2)
axes[0].plot(series2, color='salmon', lw=1.5)
axes[0].set(ylabel='Original Series')

axes[1].plot(diff, color='gray')
axes[1].set(ylabel='Difference')

axes[2].plot(easing, color='black', lw=2)
axes[2].margins(y=0.1)
axes[2].set(ylabel='Easing')

axes[3].plot(series1, color='lightblue', lw=2)
axes[3].plot(series3, color='salmon', ls='--', lw=2, dashes=(12,20))
axes[3].set(ylabel='Modified Series')

for ax in axes:
    ax.locator_params(axis='y', nbins=4)
for ax in axes[-2:]:
    ax.axvspan(i0, i1, color='0.8', alpha=0.5)

plt.show()
like image 87
Joe Kington Avatar answered Oct 17 '22 16:10

Joe Kington