Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive definitions in Pandas

I have a time-series A holding several values. I need to obtain a series B that is defined algebraically as follows:

B[t] = a * A[t] + b * B[t-1]

where we can assume B[0] = 0, and a and b are real numbers.

Is there any way to do this type of recursive computation in Pandas? Or do I have no choice but to loop in Python as suggested in this answer?

As an example of input:

> A = pd.Series(np.random.randn(10,))

0   -0.310354
1   -0.739515
2   -0.065390
3    0.214966
4   -0.605490
5    1.293448
6   -3.068725
7   -0.208818
8    0.930881
9    1.669210
like image 572
Josh Avatar asked Oct 08 '14 23:10

Josh


People also ask

What is recursive definition logic?

A recursive definition of a function defines values of the function for some inputs in terms of the values of the same function for other (usually smaller) inputs. For example, the factorial function n! is defined by the rules. 0! = 1.

How do I check if a string contains a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .

What is the fastest way to output large DataFrame into a csv file?

savetxt is still the fastest option, but only by a narrow margin: when benchmarked with pandas 1.3. 0 and numpy 1.20. 3, aa. to_csv() took 2.64 s while savetxt 2.53 s.


1 Answers

As I noted in a comment, you can use scipy.signal.lfilter. In this case (assuming A is a one-dimensional numpy array), all you need is:

B = lfilter([a], [1.0, -b], A)

Here's a complete script:

import numpy as np
from scipy.signal import lfilter


np.random.seed(123)

A = np.random.randn(10)
a = 2.0
b = 3.0

# Compute the recursion using lfilter.
# [a] and [1, -b] are the coefficients of the numerator and
# denominator, resp., of the filter's transfer function.
B = lfilter([a], [1, -b], A)

print B

# Compare to a simple loop.
B2 = np.empty(len(A))
for k in range(0, len(B2)):
    if k == 0:
        B2[k] = a*A[k]
    else:
        B2[k] = a*A[k] + b*B2[k-1]

print B2

print "max difference:", np.max(np.abs(B2 - B))

The output of the script is:

[ -2.17126121e+00  -4.51909273e+00  -1.29913212e+01  -4.19865530e+01
  -1.27116859e+02  -3.78047705e+02  -1.13899647e+03  -3.41784725e+03
  -1.02510099e+04  -3.07547631e+04]
[ -2.17126121e+00  -4.51909273e+00  -1.29913212e+01  -4.19865530e+01
  -1.27116859e+02  -3.78047705e+02  -1.13899647e+03  -3.41784725e+03
  -1.02510099e+04  -3.07547631e+04]
max difference: 0.0

Another example, in IPython, using a pandas DataFrame instead of a numpy array:

If you have

In [12]: df = pd.DataFrame([1, 7, 9, 5], columns=['A'])

In [13]: df
Out[13]: 
   A
0  1
1  7
2  9
3  5

and you want to create a new column, B, such that B[k] = A[k] + 2*B[k-1] (with B[k] == 0 for k < 0), you can write

In [14]: df['B'] = lfilter([1], [1, -2], df['A'].astype(float))

In [15]: df
Out[15]: 
   A   B
0  1   1
1  7   9
2  9  27
3  5  59
like image 75
Warren Weckesser Avatar answered Sep 28 '22 10:09

Warren Weckesser