I have a pandas dataframe as below
import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
df
col
0 abcfg_grp_202005
1 abcmn_abc_202009
2 abcgd_xyz_8976
3 abcgd_lmn_1
I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.
You can use df.apply
:
In [1441]: df['col'] = df.col.str.split('_', expand=True).apply(lambda x: (x[0] + '_' + x[2]) if len(x[2]) == 1 else x[0], axis=1)
In [1442]: df
Out[1442]:
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
Split on the underscores, then add the strings. Here we can use the trick that False
multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit.
df1 = df['col'].str.split('_', expand=True)
df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit())
print(df)
col
0 abcfg
1 abcmn
2 abcgd
3 abcgd_1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With