Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specific explode column

Tags:

python

pandas

I have dataset like this:

data = {'id': ['1','2'],
       'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(data)

Output:

    id  seq
0   1   1, 2, 001
1   2   2, 5, 4, 5, 8, 009

I want to get:

new_data = {'id': ['1', '1','2','2','2','2','2'],
       'seq': ['1, 001','1, 2, 001','2, 009','2, 5, 009','2, 5, 4, 009','2, 5, 4, 5, 009','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(new_data)

Output:

id  seq
0   1   1, 001
1   1   1, 2, 001
2   2   2, 009
3   2   2, 5, 009
4   2   2, 5, 4, 009
5   2   2, 5, 4, 5, 009
6   2   2, 5, 4, 5, 8, 009

I started from explode:

df.assign(seq=df.seq.str.split(',\s*')).explode('seq')

And now have no idea how continue. I will be glad to your comments

like image 278
savchart Avatar asked Nov 17 '25 11:11

savchart


2 Answers

Use nested list comprehension with add last value and join, then create new columns and explode:

a=[[', '.join(x[:i]+[x[-1]]) for i,y in enumerate(x[:-1],1)] for x in df.seq.str.split(',\s*')]

df = df.assign(seq=a).explode('seq')
print (df)
  id                 seq
0  1              1, 001
0  1           1, 2, 001
1  2              2, 009
1  2           2, 5, 009
1  2        2, 5, 4, 009
1  2     2, 5, 4, 5, 009
1  2  2, 5, 4, 5, 8, 009

Alternative solution:

data = {'id': ['1','2', '3'],
       'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009', '1']}
df = pd.DataFrame(data)
print (df)
  id                 seq
0  1           1, 2, 001
1  2  2, 5, 4, 5, 8, 009
2  3                   1

a = [[', '.join(x[:i]+x[-1:]) for i,y in enumerate(x[:-1],1)] 
      if len(x) > 1 else x for x in df.seq.str.split(',\s*')]

df = df.assign(seq=a).explode('seq')
print (df)
  id                 seq
0  1              1, 001
0  1           1, 2, 001
1  2              2, 009
1  2           2, 5, 009
1  2        2, 5, 4, 009
1  2     2, 5, 4, 5, 009
1  2  2, 5, 4, 5, 8, 009
2  3                   1
like image 134
jezrael Avatar answered Nov 18 '25 23:11

jezrael


you can use str.split and then apply and explode function like below

data = {'id': ['1','2'],
       'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(data)
new_df['seq'] = new_df.seq.str.split(",").apply(lambda arr: [','.join(arr[:i] + arr[-1:]) for i in range(1,len(arr))])
new_df.explode('seq')
like image 45
Dev Khadka Avatar answered Nov 19 '25 01:11

Dev Khadka



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!