I have a pandas dataframe as below <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']}) df col 0 abcfg_grp_202005 1 abcmn_abc_202009 2 abcgd_xyz_8976 3 abcgd_lmn_1 </code></pre> I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below <pre class="prettyprint"><code> col 0 abcfg 1 abcmn 2 abcgd 3 abcgd_1 </code></pre>

Split on the underscores, then add the strings. Here we can use the trick that <code>False</code> multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit. <pre class="prettyprint"><code>df1 = df['col'].str.split('_', expand=True) df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit()) </code></pre> <hr> <pre class="prettyprint"><code>print(df) col 0 abcfg 1 abcmn 2 abcgd 3 abcgd_1 </code></pre>

Replace a pandas column by splitting the text based on "_"

Tags:

python

python-3.x

pandas

dataframe

I have a pandas dataframe as below

import pandas as pd
df = pd.DataFrame({'col':['abcfg_grp_202005', 'abcmn_abc_202009', 'abcgd_xyz_8976', 'abcgd_lmn_1']})
df

    col
0   abcfg_grp_202005
1   abcmn_abc_202009
2   abcgd_xyz_8976
3   abcgd_lmn_1

I want to replace 'col' as fist instance before _ in "col". IF there is a single digit in the 3rd instance after _ then append that to end of "col" as below

    col
0   abcfg
1   abcmn
2   abcgd
3   abcgd_1

974

asked Oct 05 '20 17:10

Shanoo

2 Answers

You can use df.apply:

In [1441]: df['col'] = df.col.str.split('_', expand=True).apply(lambda x: (x[0] + '_' + x[2]) if len(x[2]) == 1 else x[0], axis=1)

In [1442]: df
Out[1442]: 
       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1

108

answered Oct 09 '22 07:10

Mayank Porwal

Split on the underscores, then add the strings. Here we can use the trick that False multiplied by a string returns the empty string to deal with the conditional addition. The check is a 1 character string that is a digit.

df1 = df['col'].str.split('_', expand=True)
df['col'] = df1[0] + ('_' + df1[2])*(df1[2].str.len().eq(1) & df1[2].str.isdigit())

print(df)

       col
0    abcfg
1    abcmn
2    abcgd
3  abcgd_1

answered Oct 09 '22 08:10

ALollz

Related questions
                            
                                How to pass --debug to build_ext when invoking setup.py install?
                            
                                Assigning column names while creating dataframe results in nan values
                            
                                How to structure imports in a large python project
                            
                                How can i login in instagram with python requests?
                            
                                Getting flake8 returned a non none zero code : 1 in docker
                            
                                Pytorch: IndexError: index out of range in self. How to solve?
                            
                                Compressing list[0], list[1], list[2],... into a simple statement
                            
                                Find the substring avoiding the use of recursive function
                            
                                Why is Python's built-in sum much slower than manual summation?
                            
                                Generate video from numpy arrays with openCV
                            
                                Replace a list of characters with indices in a string in python
                            
                                On a django site I am getting socket cluster error
                            
                                How do you make pylint in VSCode know that it's in a package (so that relative imports work)?
                            
                                Python: Dynamically create class while providing arguments to __init__subclass__()
                            
                                Calculate intersection over union (Jaccard's index) in pandas dataframe
                            
                                botocore.exceptions.SSLError: SSL validation failed on WIndows
                            
                                Have unique index value in Pandas DataFrame
                            
                                Where should I put abstract classes in a python package?
                            
                                What shebang should I use to consistently point to python3?
                            
                                Get starlette request body in the middleware context

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With