I have something that looks like this. How do I go from this:
0 d
0 The DT
1 Skoll ORGANIZATION
2 Foundation ORGANIZATION
3 , ,
4 based VBN
5 in IN
6 Silicon LOCATION
7 Valley LOCATION
to this:
0 d
0 The DT
1 Skoll Foundation ORGANIZATION
3 , ,
4 based VBN
5 in IN
6 Silicon Valley LOCATION
@rfan's answer of course works, as an alternative, here's an approach using pandas groupby.
The .groupby()
groups the data by the 'b' column - the sort=False
is necessary to keep the order intact. The .apply()
applies a function to each group of b data, in this case joining the string together separated by spaces.
In [67]: df.groupby('b', sort=False)['a'].apply(' '.join)
Out[67]:
b
DT The
Org Skoll Foundation
, ,
VBN based
IN in
Location Silicon Valley
Name: a, dtype: object
EDIT:
To handle the more general case (repeated non-consecutive values) - an approach would be to first add a sentinel column that tracks which group of consecutive data each row applies to, like this:
df['key'] = (df['b'] != df['b'].shift(1)).astype(int).cumsum()
Then add the key to the groupby and it should work even with repeated values. For example, with this dummy data with repeats:
df = DataFrame({'a': ['The', 'Skoll', 'Foundation', ',',
'based', 'in', 'Silicon', 'Valley', 'A', 'Foundation'],
'b': ['DT', 'Org', 'Org', ',', 'VBN', 'IN',
'Location', 'Location', 'Org', 'Org']})
Applying the groupby:
In [897]: df.groupby(['key', 'b'])['a'].apply(' '.join)
Out[897]:
key b
1 DT The
2 Org Skoll Foundation
3 , ,
4 VBN based
5 IN in
6 Location Silicon Valley
7 Org A Foundation
Name: a, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With