<h3>Problem</h3> I have the following Pandas dataframe: <pre class="prettyprint lang-py prettyprint-override"><code> data = { 'ID': [100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300], 'value': [False, False, True, False, False, True, True, True, False, False, False, True, True, False], } df = pandas.DataFrame (data, columns = ['ID','value']) </code></pre> I want to get the following groups: <ul> <li>Group 1: for each ID, all False rows until the first True row of that ID</li> <li>Group 2: for each ID, all False rows after the last True row of that ID</li> <li>Group 3: all true rows</li> </ul> <img src="https://i.stack.imgur.com/8Kr8Q.png" alt="enter image description here"> Can this be done with pandas? <h3>What I've tried</h3> I've tried <pre class="prettyprint"><code>group = df.groupby((df['value'].shift() != df['value']).cumsum()) </code></pre> but this returns an incorrect result.

Let's try following your logic: <pre class="prettyprint"><code># 1. all False up to first True group1 = df.loc[df.groupby('ID')['value'].cumsum() == 0] # 2. all False after last True group2 = df.loc[df.iloc[::-1].groupby('ID')['value'].cumsum()==0] # 3. all True group3 = df[df['value']] </code></pre> Output: <pre class="prettyprint"><code> ID value 0 100 False 1 100 False 4 200 False 9 300 False 10 300 False ID value 3 100 False 8 200 False 13 300 False ID value 2 100 True 5 200 True 6 200 True 7 200 True 11 300 True 12 300 True </code></pre>

This works for your example data <pre class="prettyprint"><code>df['groups'] = df.groupby('ID').value.apply(lambda x: x.diff().ne(False).cumsum()).astype('int') for _,df_groups in df.groupby('groups'): print(df_groups) print('-'*20) </code></pre> Out: <pre class="prettyprint"><code> ID value groups 0 100 False 1 1 100 False 1 4 200 False 1 9 300 False 1 10 300 False 1 -------------------- ID value groups 2 100 True 2 5 200 True 2 6 200 True 2 7 200 True 2 11 300 True 2 12 300 True 2 -------------------- ID value groups 3 100 False 3 8 200 False 3 13 300 False 3 -------------------- </code></pre>

Group pandas dataframe in unusual way

Problem

I have the following Pandas dataframe:

    data = {
        'ID':  [100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300],
        'value': [False, False, True, False, False, True, True, True, False, False, False, True, True, False],
    }
    df = pandas.DataFrame (data, columns = ['ID','value'])

I want to get the following groups:

Group 1: for each ID, all False rows until the first True row of that ID
Group 2: for each ID, all False rows after the last True row of that ID
Group 3: all true rows

enter image description here

Can this be done with pandas?

What I've tried

I've tried

group = df.groupby((df['value'].shift() != df['value']).cumsum())

but this returns an incorrect result.

811

asked Sep 29 '20 14:09

Ford1892

3 Answers

Let us try shift + cumsum create the groupby key: BTW I really like the way you display your expected output

s = df.groupby('ID')['value'].apply(lambda x : x.ne(x.shift()).cumsum())
d = {x : y for x ,y in df.groupby(s)}
d[2]
     ID  value
2   100   True
5   200   True
6   200   True
7   200   True
11  300   True
12  300   True
d[1]
     ID  value
0   100  False
1   100  False
4   200  False
9   300  False
10  300  False
d[3]
     ID  value
3   100  False
8   200  False
13  300  False

answered Oct 11 '22 11:10

BENY

Let's try following your logic:

# 1. all False up to first True
group1 = df.loc[df.groupby('ID')['value'].cumsum() == 0]

# 2. all False after last True
group2 = df.loc[df.iloc[::-1].groupby('ID')['value'].cumsum()==0]

# 3. all True
group3 = df[df['value']]

Output:

    ID      value
0   100     False
1   100     False
4   200     False
9   300     False
10  300     False

    ID      value
3   100     False
8   200     False
13  300     False

    ID      value
2   100     True
5   200     True
6   200     True
7   200     True
11  300     True
12  300     True

answered Oct 11 '22 12:10

Quang Hoang

This works for your example data

df['groups'] = df.groupby('ID').value.apply(lambda x: x.diff().ne(False).cumsum()).astype('int')
for _,df_groups in df.groupby('groups'):
  print(df_groups)
  print('-'*20)

Out:

     ID  value  groups
0   100  False       1
1   100  False       1
4   200  False       1
9   300  False       1
10  300  False       1
--------------------
     ID  value  groups
2   100   True       2
5   200   True       2
6   200   True       2
7   200   True       2
11  300   True       2
12  300   True       2
--------------------
     ID  value  groups
3   100  False       3
8   200  False       3
13  300  False       3
--------------------

answered Oct 11 '22 11:10

Michael Szczesny

Related questions
                            
                                Python variables lose scope inside generator?
                            
                                django model save - override method not invoked during migrations
                            
                                Plot single data with two Y axes (two units) in matplotlib
                            
                                Update joined table via SQLAlchemy ORM using session.query
                            
                                HandShake Failure in python(_ssl.c:590)
                            
                                How to cache reads?
                            
                                Tensor multiplication with numpy tensordot
                            
                                Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)
                            
                                Sorting an Array in TensorFlow
                            
                                How to implement multi-class semantic segmentation?
                            
                                Disable Exit (or [ X ]) in tkinter Window
                            
                                PEP257 - D212, and D213 conflicts?
                            
                                How to set the log level for an imported module?
                            
                                tensorflow Dataset API diff between make_initializable_iterator and make_one_shot_iterator
                            
                                How to run python in Visual Studio Code as a main module
                            
                                How to analyze dependency tree for conda
                            
                                Pip install from private Git repo, with Personal access token in Git URL
                            
                                Should I add Python's pyc files to .dockerignore?
                            
                                BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
                            
                                How can I Export Pandas DataFrame to Google Sheets using Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group pandas dataframe in unusual way

Tags:

python

pandas

pandas-groupby