Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string within a pandas DataFrame element and recombine a section of the list

I am trying to figure out how to split a string within a pandas element, then recombine a section of the split string. I have the following code:

import pandas as pd

df = pd.DataFrame({'code': ['PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN',
                            'PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER']})

df['domain'] = df['code'].str.split("_")

This code works for splitting the string on the underscore. Now I would like to take the resulting split list within the column and recombine the first three elements such that:

PC001-S001_D_CFI4-1_NN ==> PC001-S001_D_CFI4-1

I can do this if I was just applying to a string using:

a = 'PC001-S002_D_CFI4-1_NN'
b = a.split("_")[0:3]
c = "_".join(b)

However, I have tried to apply this to pandas without much success.

Any advice would be greatly received.

like image 461
BillyJo_rambler Avatar asked Nov 27 '25 04:11

BillyJo_rambler


2 Answers

you can use Series.str.rsplit(...):

In [11]: df['domain'] = df['code'].str.rsplit('_',1).str[0]

In [12]: df
Out[12]:
                     code               domain
0  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
1  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
2  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
3  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
4  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
5  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1

or just remove the last section:

In [7]: df['domain'] = df['code'].str.replace(r'\_\w+?$','')

In [8]: df
Out[8]:
                     code               domain
0  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
1  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
2  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
3  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
4  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
5  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
like image 68
MaxU - stop WAR against UA Avatar answered Nov 28 '25 20:11

MaxU - stop WAR against UA


Use str[:3] for select first 3 lists and then join:

df['domain'] = df['code'].str.split("_").str[:3].str.join('_')
print (df)

                     code               domain
0  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
1  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
2  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
3  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
4  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
5  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
like image 39
jezrael Avatar answered Nov 28 '25 19:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!