Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use pandas to find consecutive same data in time series

Tags:

Here is a time series data like this,call it df:

      'No'       'Date'       'Value' 0     600000     1999-11-10    1 1     600000     1999-11-11    1 2     600000     1999-11-12    1 3     600000     1999-11-15    1 4     600000     1999-11-16    1 5     600000     1999-11-17    1 6     600000     1999-11-18    0 7     600000     1999-11-19    1 8     600000     1999-11-22    1 9     600000     1999-11-23    1 10    600000     1999-11-24    1 11    600000     1999-11-25    0 12    600001     1999-11-26    1 13    600001     1999-11-29    1 14    600001     1999-11-30    0 

I want to get the date range of the consecutive 'Value' of 1, so how can I get the final result as follows:

   'No'     'BeginDate'    'EndDate'   'Consecutive' 0 600000    1999-11-10    1999-11-17    6 1 600000    1999-11-19    1999-11-24    4 2 600001    1999-11-26    1999-11-29    2 
like image 643
figo Avatar asked Nov 13 '14 14:11

figo


People also ask

How do you repeat a Series on Pandas?

Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

How do you count occurrences in Pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you tell the difference between consecutive rows in Pandas?

diff() function. This function calculates the difference between two consecutive DataFrame elements. Parameters: periods: Represents periods to shift for computing difference, Integer type value.


1 Answers

This should do it

df['value_grp'] = (df.Values.diff(1) != 0).astype('int').cumsum() 

value_grp will increment by one whenever Value changes. Below, you can extract the group results

pd.DataFrame({'BeginDate' : df.groupby('value_grp').Date.first(),                'EndDate' : df.groupby('value_grp').Date.last(),               'Consecutive' : df.groupby('value_grp').size(),                'No' : df.groupby('value_grp').No.first()}).reset_index(drop=True) 
like image 81
user1827356 Avatar answered Sep 30 '22 18:09

user1827356