I have a question about grouping only certain rows together in a pandas dataframe (that is ordered by timestamp), depending on their column values.
So here is an example:
df=pd.DataFrame({"text":["Hello.",
"I had a question.",
"Hi!",
"Yes how can I help?",
"Do you ship to the UK?"
],
"timestamp":[
pd.Timestamp('20131213 11:50:00'),
pd.Timestamp('20131213 11:51:00'),
pd.Timestamp('20131213 11:52:00'),
pd.Timestamp('20131213 11:53:00'),
pd.Timestamp('20131213 11:54:00')
],
"direction":["In","In","Out","Out","In"]})
This is what the dataframe looks like:

This dataframe is ordered by timestamp and could be (for example) a chat thread where direction "In" could be one person talking and "Out" is another person talking.
What I would like to get is something like this:

In the final dataframe, the text of the rows are grouped together into one row if they are the same direction, but rows are only grouped together until you reach a row with a different direction. AND the order of the messages is retained.
Does anyone have any ideas?
Setup
operations = {
'text': ' '.join,
'direction': 'first',
}
Using agg and a common trick to group by consecutive values:
df.groupby(df.direction.ne(df.direction.shift()).cumsum()).agg(operations)
text direction
direction
1 Hello. I had a question. In
2 Hi! Yes how can I help? Out
3 Do you ship to the UK? In
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With