Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

concatenate several lines in one with a delimiter with pandas

Tags:

python

pandas

I want to be able to concatenate string on several lines into one according to an ID. I use the library pandas (python 3).

val   id
Cat   1
Tiger 2
Ball  3
Bat   1
bill  2
dog   1

l = []
a = 0
while a < lendata:
    if df["id"][a] == 1:
        if a != 0:
            df["val"][tmp] = ' '.join(l)
            l = []
        tmp = a
        l.append(df["val"][a])
    else:
        l.append(df["val"][a])
    a += 1

It works with loops. i need this result,

val
Cat Tiger Ball
Bat bill
dog

not a group by

Question: Do you know how to do it with pandas functions? Thanks.

like image 541
jb255 Avatar asked Dec 10 '22 02:12

jb255


2 Answers

Staying in pandas:

df['group'] = (df['id'] == 1).cumsum()
df.groupby('group')['val'].apply(' '.join).reset_index()
   id             val
0   1  Cat Tiger Ball
1   2        Bat bill
2   3             dog

The first line defines groups according to your definition. The second line is a standard groupby operation.

like image 101
IanS Avatar answered Feb 02 '23 07:02

IanS


You can also create an array like so :

a = np.array(range(len(df)))

Then you create a third column which equals to your id minus the previous array. This third column will show you which val are together.

df['regroup'] = df['id'].subtract(a)

Out:

id  val regroup
0   1   Cat 1
1   2   Tiger   1
2   3   Ball    1
3   1   Bat -2

You can now use a group by to have your desired output :

In [1] : df.groupby(['regroup'])['val'].apply(' '.join)
Out[1] : regroup
-2               Bat 
 1    Cat Tiger Ball 
like image 24
vlemaistre Avatar answered Feb 02 '23 07:02

vlemaistre