Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a pandas column by splitting the text based on "_"

I have a pandas dataframe as below

import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
df

    col
0   abcfg_grp_202005
1   abcmn_abc_202009
2   abcgd_xyz_8976
3   abcgd_lmn_1

I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below

    col
0   abcfg
1   abcmn
2   abcgd
3   abcgd_1
like image 974
Shanoo Avatar asked Oct 05 '20 17:10

Shanoo


People also ask

How do I split data in a column in Pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How replace column values in Pandas based on multiple conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I split a text column into two separate columns?

Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.


2 Answers

You can use df.apply:

In [1441]: df['col'] = df.col.str.split('_', expand=True).apply(lambda x: (x[0] + '_' + x[2]) if len(x[2]) == 1 else x[0], axis=1)

In [1442]: df
Out[1442]: 
       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1
like image 108
Mayank Porwal Avatar answered Oct 09 '22 07:10

Mayank Porwal


Split on the underscores, then add the strings. Here we can use the trick that False multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit.

df1 = df['col'].str.split('_', expand=True)
df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit())

print(df)

       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1
like image 2
ALollz Avatar answered Oct 09 '22 08:10

ALollz