Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split column of pandas dataframe based on multiple characters

I have a pandas dataframe which looks like this :

   Un_ID  P_ID   segment      
0   Q8TDU6  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)  
1   P63092  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)  
2   Q8TDU6  7cfm    1( 22- 41), 2( 51- 72), 3( 86- 108) 

I want to split the third column'segment' into three columns i.e TM,starting,ending

   Un_ID    P_ID                   segment                TM starting ending
0   Q8TDU6  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM1 16       41
1   P63092  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM1 16       41
2   Q8TDU6  7cfm    1( 22- 41), 2( 51- 72), 3( 86- 108)   TM1 22       41
0   Q8TDU6  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM2 51       73
1   P63092  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM2 51       73
2   Q8TDU6  7cfm    1( 22- 41), 2( 51- 72), 3( 86- 108)   TM2 51       72
0   Q8TDU6  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM3 86       108
1   P63092  7bw0    1( 16- 41), 2( 51- 73), 3( 86- 108)   TM3 86       108
2   Q8TDU6  7cfm    1( 22- 41), 2( 51- 72), 3( 86- 108)   TM3 86       108

I tried following code

df[['TM','starting','ending']] = df.segment.apply(lambda x: pd.Series(str(x).split(","))

But,I am not sure how to manipulate the above code to get the dataframe as I want..

like image 208
shome Avatar asked Jan 24 '23 07:01

shome


1 Answers

Try:

import re

r = re.compile(r"(\d+)\(\s*(\d+)-\s*(\d+)\)")

df["segment"] = df["segment"].apply(lambda x: r.findall(x))
df = df.explode("segment")
df[["TM", "starting", "ending"]] = df.pop("segment").apply(pd.Series)
df = df.sort_values(by="TM")
df["TM"] = "TM" + df["TM"].astype(str)
print(df)

Prints:

    Un_ID  P_ID   TM starting ending
0  Q8TDU6  7bw0  TM1       16     41
1  P63092  7bw0  TM1       16     41
2  Q8TDU6  7cfm  TM1       22     41
0  Q8TDU6  7bw0  TM2       51     73
1  P63092  7bw0  TM2       51     73
2  Q8TDU6  7cfm  TM2       51     72
0  Q8TDU6  7bw0  TM3       86    108
1  P63092  7bw0  TM3       86    108
2  Q8TDU6  7cfm  TM3       86    108
like image 185
Andrej Kesely Avatar answered Jan 27 '23 03:01

Andrej Kesely