Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of consecutive values that meets a condition (Pandas Dataframe)

So I created this post regarding my problem 2 days ago and got an answer thankfully.

I have a data made of 20 rows and 2500 columns. Each column is a unique product and rows are time series, results of measurements. Therefore each product is measured 20 times and there are 2500 products.

This time I want to know for how many consecutive rows my measurement result can stay above a specific threshold. AKA: I want to count the number of consecutive values that is above a value, let's say 5.

A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2] We have these values in bold and according to what I defined above, I should get NumofConsFeature = 3 as the result. (Getting the max if there are more than 1 series that meets the condition)

I thought of filtering using .gt, then getting the indexes and using a loop afterwards in order to detect the consecutive index numbers but couldn't make it work.

In 2nd phase, I'd like to know the index of the first value of my consecutive series. For the above example, that would be 3. But I have no idea of how for this one.

Thanks in advance.

like image 948
meliksahturker Avatar asked Oct 05 '18 18:10

meliksahturker


People also ask

How do you count the number of occurrences in Pandas DataFrame?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you count consecutive sequences in Python?

Use itertools. groupby() to Count Consecutive Occurrences in Python | by Sabi Horvat | Aug, 2022 | Towards Data Science.

How do you count occurrences of a value in a DataFrame?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.


1 Answers

Here's another answer using only Pandas functions:

A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
a = pd.DataFrame(A, columns = ['foo'])
a['is_large'] = (a.foo > 5)
a['crossing'] = (a.is_large != a.is_large.shift()).cumsum()
a['count'] = a.groupby(['is_large', 'crossing']).cumcount(ascending=False) + 1
a.loc[a.is_large == False, 'count'] = 0

which gives

    foo  is_large  crossing  count
0     1     False         1      0
1     2     False         1      0
2     6      True         2      3
3     8      True         2      2
4     7      True         2      1
5     3     False         3      0
6     2     False         3      0
7     3     False         3      0
8     6      True         4      2
9    10      True         4      1
10    2     False         5      0
11    1     False         5      0
12    0     False         5      0
13    2     False         5      0

From there on you can easily find the maximum and its index.

like image 67
Bart Avatar answered Sep 20 '22 06:09

Bart