I have a dataframe that have two values:
df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})
What I am trying to do is to remove the numeric digits in case of the split('_') only have numeric digits. The desired output is:
Table_A112
Table_A_
For that I am using the following code:
import pandas as pd
import difflib
from tabulate import tabulate
import string
df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})
print(tabulate(df, headers='keys', tablefmt='psql'))
df['Col2'] = df['Col1'].str.rstrip(string.digits)
print(tabulate(df, headers='keys', tablefmt='psql'))
But it gives me the following output:
Table_A
Table_A_
How can do what I want?
Thanks!
You can do something like:
s = df['Col1'].str.split('_',expand=True).stack()
s.mask(s.str.isdigit(), '').groupby(level=0).agg('_'.join)
Output:
0 Table_A112
1 Table_A_
dtype: object
Here's one way using str.replace
:
df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112', 'Table_112_avs']})
print(df)
Col1
0 Table_A112
1 Table_A_112
2 Table_112_avs
df.Col1.str.replace(r'(?:^|_)(\d+)(?:$|_)', '_', regex=True)
0 Table_A112
1 Table_A_
2 Table_avs
Name: Col1, dtype: object
See demo
If you insist on a regex solution you can do using pandas.replace()
and a positive lookbehind r'(?<=_)\d+'
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})
print(tabulate(df, headers='keys', tablefmt='psql'))
df= df.replace(regex=r'(?<=_)\d+', value='')
print(tabulate(df, headers='keys', tablefmt='psql'))
Which produces the desired output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With