Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: flag consecutive values

Tags:

I have a pandas series of the form [0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1].

0: indicates economic increase.
1: indicates economic decline.

A recession is signaled by two consecutive declines (1).

The end of the recession is signaled by two consecutive increase (0).

In the above dataset I have two recessions, begin at index 3, end at index 5 and begin at index 8 end at index 11.

I am at a lost for how to approach this with pandas. I would like to identify the index for the start and end of the recession. Any assistance would be appreciated.

Here is my python attempt at a soln.

np_decline =  np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
recession_start_flag = 0
recession_end_flag = 0
recession_start = []
recession_end = []

for i in range(len(np_decline) - 1):
    if recession_start_flag == 0 and np_decline[i] == 1 and np_decline[i + 1] == 1:
        recession_start.append(i)
        recession_start_flag = 1
    if recession_start_flag == 1 and np_decline[i] == 0 and np_decline[i + 1] == 0:
        recession_end.append(i - 1)
        recession_start_flag = 0

print(recession_start)
print(recession_end)

Is the a more pandas centric approach? Leon

like image 398
Leon Adams Avatar asked Nov 11 '16 19:11

Leon Adams


2 Answers

The start of a run of 1's satisfies the condition

x_prev = x.shift(1)
x_next = x.shift(-1)
((x_prev != 1) & (x == 1) & (x_next == 1))

That is to say, the value at the start of a run is 1 and the previous value is not 1 and the next value is 1. Similarly, the end of a run satisfies the condition

((x == 1) & (x_next == 0) & (x_next2 == 0))

since the value at the end of a run is 1 and the next two values value are 0. We can find indices where these conditions are true using np.flatnonzero:

import numpy as np
import pandas as pd

x = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
x_prev = x.shift(1)
x_next = x.shift(-1)
x_next2 = x.shift(-2)
df = pd.DataFrame(
    dict(start = np.flatnonzero((x_prev != 1) & (x == 1) & (x_next == 1)),
         end = np.flatnonzero((x == 1) & (x_next == 0) & (x_next2 == 0))))
print(df[['start', 'end']])

yields

   start  end
0      3    5
1      8   11
like image 175
unutbu Avatar answered Sep 20 '22 13:09

unutbu


You can use shift:

df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1], columns=['signal'])
df_prev = df.shift(1)['signal']
df_next = df.shift(-1)['signal']
df_next2 = df.shift(-2)['signal']
df.loc[(df_prev != 1) & (df['signal'] == 1) & (df_next == 1), 'start'] = 1
df.loc[(df['signal'] != 0) & (df_next == 0) & (df_next2 == 0), 'end'] = 1
df.fillna(0, inplace=True)
df = df.astype(int)

    signal  start  end
0        0      0    0
1        1      0    0
2        0      0    0
3        1      1    0
4        1      0    0
5        1      0    1
6        0      0    0
7        0      0    0
8        1      1    0
9        1      0    0
10       0      0    0
11       1      0    1
12       0      0    0
13       0      0    0
14       1      0    0
like image 25
Dennis Golomazov Avatar answered Sep 18 '22 13:09

Dennis Golomazov