Finding consecutive segments in a pandas data frame

Tags:

I have a pandas.DataFrame with measurements taken at consecutive points in time. Along with each measurement the system under observation had a distinct state at each point in time. Hence, the DataFrame also contains a column with the state of the system at each measurement. State changes are much slower than the measurement interval. As a result, the column indicating the states might look like this (index: state):

1:  3 2:  3 3:  3 4:  3 5:  4 6:  4 7:  4 8:  4 9:  1 10: 1 11: 1 12: 1 13: 1

Is there an easy way to retrieve the indices of each segment of consecutively equal states. That means I would like to get something like this:

[[1,2,3,4], [5,6,7,8], [9,10,11,12,13]]

The result might also be in something different than plain lists.

The only solution I could think of so far is manually iterating over the rows, finding segment change points and reconstructing the indices from these change points, but I have the hope that there is an easier solution.

253

asked Jan 16 '13 12:01

languitar

1 Answers

One-liner:

df.reset_index().groupby('A')['index'].apply(np.array)

Code for example:

In [1]: import numpy as np  In [2]: from pandas import *  In [3]: df = DataFrame([3]*4+[4]*4+[1]*4, columns=['A']) In [4]: df Out[4]:     A 0   3 1   3 2   3 3   3 4   4 5   4 6   4 7   4 8   1 9   1 10  1 11  1  In [5]: df.reset_index().groupby('A')['index'].apply(np.array) Out[5]: A 1    [8, 9, 10, 11] 3      [0, 1, 2, 3] 4      [4, 5, 6, 7]

You can also directly access the information from the groupby object:

In [1]: grp = df.groupby('A')  In [2]: grp.indices Out[2]: {1L: array([ 8,  9, 10, 11], dtype=int64),  3L: array([0, 1, 2, 3], dtype=int64),  4L: array([4, 5, 6, 7], dtype=int64)}  In [3]: grp.indices[3] Out[3]: array([0, 1, 2, 3], dtype=int64)

To address the situation that DSM mentioned you could do something like:

In [1]: df['block'] = (df.A.shift(1) != df.A).astype(int).cumsum()  In [2]: df Out[2]:     A  block 0   3      1 1   3      1 2   3      1 3   3      1 4   4      2 5   4      2 6   4      2 7   4      2 8   1      3 9   1      3 10  1      3 11  1      3 12  3      4 13  3      4 14  3      4 15  3      4

Now groupby both columns and apply the lambda function:

In [77]: df.reset_index().groupby(['A','block'])['index'].apply(np.array) Out[77]: A  block 1  3          [8, 9, 10, 11] 3  1            [0, 1, 2, 3]    4        [12, 13, 14, 15] 4  2            [4, 5, 6, 7]

answered Sep 20 '22 01:09

Zelazny7

Related questions
                            
                                how to create class variable dynamically in python
                            
                                How to pass arguments to the metaclass from the class definition?
                            
                                How to display html content through flask messages?
                            
                                Find broken symlinks with Python
                            
                                Would Python make a good substitute for the Windows command-line/batch scripts?
                            
                                How can I convert canvas content to an image?
                            
                                Python list comprehension, unpacking and multiple operations
                            
                                Equality in Pandas DataFrames - Column Order Matters?
                            
                                Setting Transparency Based on Pixel Values in Matplotlib
                            
                                Python pip unable to locate pyodbc
                            
                                Switching to Python 3 causing UnicodeDecodeError
                            
                                How to run Tox with Travis-CI
                            
                                Default message in custom exception - Python
                            
                                ModuleNotFoundError: No module named 'sklearn.externals.six'
                            
                                Is there a way to decode numerical COM error-codes in pywin32
                            
                                Python xlwt - accessing existing cell content, auto-adjust column width
                            
                                dict.get() - default arg evaluated even upon success
                            
                                Use "contains" and "iexact" at the same query in DJANGO
                            
                                Static classes in Python
                            
                                How can I change the font size of ticks of axes object in matplotlib [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding consecutive segments in a pandas data frame

Tags:

python

pandas

languitar

People also ask

1 Answers

Zelazny7

Recent Activity

Donate For Us