Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine Consecutive Rows with the Same column values

Tags:

python

pandas

I have something that looks like this. How do I go from this:

    0             d
0   The         DT
1   Skoll       ORGANIZATION
2   Foundation  ORGANIZATION
3   ,           ,
4   based       VBN
5   in          IN
6   Silicon     LOCATION
7   Valley      LOCATION

to this:

    0                       d
0   The                     DT
1   Skoll Foundation        ORGANIZATION
3   ,                       ,
4   based                   VBN
5   in                      IN
6   Silicon Valley          LOCATION
like image 670
user3314418 Avatar asked Aug 05 '14 19:08

user3314418


1 Answers

@rfan's answer of course works, as an alternative, here's an approach using pandas groupby.

The .groupby() groups the data by the 'b' column - the sort=False is necessary to keep the order intact. The .apply() applies a function to each group of b data, in this case joining the string together separated by spaces.

In [67]: df.groupby('b', sort=False)['a'].apply(' '.join)
Out[67]: 

b
DT                       The
Org         Skoll Foundation
,                          ,
VBN                    based
IN                        in
Location      Silicon Valley
Name: a, dtype: object

EDIT:

To handle the more general case (repeated non-consecutive values) - an approach would be to first add a sentinel column that tracks which group of consecutive data each row applies to, like this:

df['key'] = (df['b'] != df['b'].shift(1)).astype(int).cumsum()

Then add the key to the groupby and it should work even with repeated values. For example, with this dummy data with repeats:

df = DataFrame({'a': ['The', 'Skoll', 'Foundation', ',', 
                      'based', 'in', 'Silicon', 'Valley', 'A', 'Foundation'], 
                'b': ['DT', 'Org', 'Org', ',', 'VBN', 'IN', 
                      'Location', 'Location', 'Org', 'Org']})

Applying the groupby:

In [897]: df.groupby(['key', 'b'])['a'].apply(' '.join)
Out[897]: 
key  b       
1    DT                       The
2    Org         Skoll Foundation
3    ,                          ,
4    VBN                    based
5    IN                        in
6    Location      Silicon Valley
7    Org             A Foundation
Name: a, dtype: object
like image 145
chrisb Avatar answered Nov 02 '22 18:11

chrisb