group rows in ordered pandas dataframe depending on column values

Question

I have a question about grouping only certain rows together in a pandas dataframe (that is ordered by timestamp), depending on their column values.

So here is an example:

df=pd.DataFrame({"text":["Hello.",
                    "I had a question.", 
                    "Hi!",
                    "Yes how can I help?",
                    "Do you ship to the UK?"
                    ],
            "timestamp":[
                        pd.Timestamp('20131213 11:50:00'),
                        pd.Timestamp('20131213 11:51:00'),
                        pd.Timestamp('20131213 11:52:00'),
                        pd.Timestamp('20131213 11:53:00'),
                        pd.Timestamp('20131213 11:54:00')
                        ],
            "direction":["In","In","Out","Out","In"]})

This is what the dataframe looks like:

enter image description here

This dataframe is ordered by timestamp and could be (for example) a chat thread where direction "In" could be one person talking and "Out" is another person talking.

What I would like to get is something like this: enter image description here

In the final dataframe, the text of the rows are grouped together into one row if they are the same direction, but rows are only grouped together until you reach a row with a different direction. AND the order of the messages is retained.

Does anyone have any ideas?

user3483203 · Accepted Answer

Setup

operations = {
    'text': ' '.join,
    'direction': 'first',
}

Using agg and a common trick to group by consecutive values:

df.groupby(df.direction.ne(df.direction.shift()).cumsum()).agg(operations)

                               text direction
direction
1          Hello. I had a question.        In
2           Hi! Yes how can I help?       Out
3            Do you ship to the UK?        In

group rows in ordered pandas dataframe depending on column values

Tags:

python

python-3.x

pandas

Imu

1 Answers

user3483203

Recent Activity

Donate For Us

group rows in ordered pandas dataframe depending on column values

Tags:

python

python-3.x

pandas

Imu

1 Answers

user3483203

Related questions

Recent Activity

Donate For Us