I want to extract a numerical percentage in a string. Here are some cases:
Commas are used solely as separators and there's only one percentage for each string, so the following strings will never occur:
Currently, I'm using the following script in Python
def extract_percentage(x: str) -> float:
float((re.sub(r'[^\d,]', '', x)).replace(',','.'))
It works for the first two examples above, but for the third, the output is 12.3
How should I do it? Preferably, using Python.
Your regex removes spaces, as well as everything else. I think that to find something using regex, the best way is to search for it, using the re library.
We will start by looking for all strings ending with %: '.*%'.
For Bank ABC 123% CDE this will return Bank ABC 123% CDE which, contains space and non-digits.
To improve on that, let's look for numbers with 1 comma or dot at most: \d*[,.]?\d*%, this will return 123% for your input
To wrap things up, let's replace the comma with a dot
import re
str = 'Bank1 2,3%'
arr = [x.replace(',','.') for x in re.findall('\d*[,.]?\d*%',str)]
print(arr)
>>> ['2.3%']
Note that the answer is an array of all matches
If you want to get the number out, you can now just do:
if len(arr)>0:
number_without_percent_sign = arr[0][:-1]
print(float(number_without_percent_sign))
>>> 2.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With