I am trying to replace the string rs. from my string
df['Purpose'] = df['Purpose'].str.replace('rs.','')
+-------+----------+--------+
| Input | Expected | Output |
+-------+----------+--------+
| rs.22 | 22 | 22 |
+-------+----------+--------+
| rs32 | rs32 | 2 |
+-------+----------+--------+
The code for testing:
x = pd.DataFrame(['rs.22', 'rs32'], columns=['Purpose'])
x['Purpose'] = x['Purpose'].str.replace('rs.','')
print('x mod', x)
This gives the following output:
x mod Purpose
0 22
1 2
PS: extracting numbers only method using the regex [-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)? couldn't distinguish between rs.3.5 as 3.5 but gave the output as .3.5
Normally, replace operates in regex mode. You have two simple options to get around it. The preferred one, suggested by @101, is to turn off regex:
df['Purpose'] = df['Purpose'].str.replace('rs.', '', regex=False)
Another alternative is to escape the dot so it matches an actual period instead of any character. This is the option to use in versions of pandas before 0.23.0, when the regex parameter was introduced:
df['Purpose'] = df['Purpose'].str.replace(r'rs\.', '')
Regex matching is generally slower than simple string comparisons, so the first option can be expected to be more performant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With