Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform a Python Split on a Pandas Dataframe

Tags:

python

pandas

I have the following dataframe:

import pandas as pd

data = {'Test_Step_ID': ['9.1.1', '9.1.2', '9.1.3', '9.1.4'],
        'Protocol_Name': ['A', 'B', 'C', 'D'],
        'Req_ID': ['SRS_0081d', 'SRS_0079', 'SRS_0082SRS_0082a', 'SRS_0015SRS_0015cSRS_0015d']
        }
df = pd.DataFrame(data)

I want to duplicate the rows based on the column "Req_ID" based on the "SRS" value keeping all other columns values same; hence I want 2 rows for the SRS_0082, SRS_0082a and then three rows for SRS_0015, SRS_0015c, SRS_0015d

Can someone help me here? appreciate the help. Thanks in advance. [EDITED]:

I want the result to look like this: enter image description here

like image 846
ruser Avatar asked Oct 16 '25 01:10

ruser


2 Answers

split on the zero width location between SRS and a preceding character using the '(?<=.)(?=SRS) regex, and explode:

out = (df
  .assign(Req_ID=df['Req_ID'].str.split(r'(?<=.)(?=SRS)'))
  .explode('Req_ID')
 )

Output:

  Test_Step_ID Protocol_Name     Req_ID
0        9.1.1             A  SRS_0081d
1        9.1.2             B   SRS_0079
2        9.1.3             C   SRS_0082
2        9.1.3             C  SRS_0082a
3        9.1.4             D   SRS_0015
3        9.1.4             D  SRS_0015c
3        9.1.4             D  SRS_0015d

Regex:

(?<=.)  # match any character before the split
(?=SRS) # match "SRS" after the split

regex demo

like image 132
mozway Avatar answered Oct 18 '25 14:10

mozway


I have modified your code, you can try -

df['Req_ID'] = df['Req_ID'].str.split('SRS_')

df = df.explode('Req_ID')

df['Req_ID'] = df['Req_ID'].str.strip()
df = df[df['Req_ID'].ne('')]

df['Req_ID'] = 'SRS_' + df['Req_ID']

print(df)
like image 27
Pravash Panigrahi Avatar answered Oct 18 '25 14:10

Pravash Panigrahi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!