Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas find sequence or pattern in column

Here's some example data for the problem I'm working on:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

The Sales_Growth column tells me if there was indeed sales growth in the quarter or not. 0 = no growth, 1 = growth.

First, I'm trying to return the first Quarter when there were two consecutive quarters of no sales growth.

With the data above this answer would be 2001q1.

Then, I want to return the 2nd quarter of consecutive sales growth that occurs AFTER the initial two quarters of no growth.

The answer to this question would be 2004q4.

I've searched but the closest answer I can find I can't get to work: https://stackoverflow.com/a/26539166/3225420

I am a Pandas beginner.

like image 475
Python_Learner_DK Avatar asked Mar 02 '17 12:03

Python_Learner_DK


People also ask

How do I find a pattern in a DataFrame column?

To check whether column values match or contain a pattern in Pandas DataFrame, use the Series' str. contain(~) method.

What is Tolist () in pandas?

tolist()[source] Return a list of the values. These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period) Returns list.

What is ILOC () in Python?

The iloc() function in python is defined in the Pandas module, which helps us select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

What does .values do in pandas?

Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.


2 Answers

You're doing subsequence matching. This is a bit strange, but bear with me:

growth = df.Sales_Growth.astype(str).str.cat()

That gives you:

'00100011'

Then:

growth.index('0011')

Gives you 4 (obviously you'd add a constant 3 to get the index of the last row matched by the pattern).

I feel this approach starts off a bit ugly, but the end result is really usable--you can search for any fixed pattern with no additional coding.

like image 130
John Zwinck Avatar answered Sep 19 '22 16:09

John Zwinck


For Q1:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

For Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)
like image 25
languitar Avatar answered Sep 16 '22 16:09

languitar