How to extract all numeric like values from string?

Question

I have a strings that contain different values (numeric and non-numeric mixed). I want to be able to extract the values from the text. I could not get my head around how to extract all (or most of) possible cases. I have a partially working solution like this,

def extract_values(sentence):
    #sentence = normalizeString(sentence)
    matches = re.findall(r'((\d*\.?\d+(?:\/\d*\.?\d+)?)(?:\s+and\s+(\d*\.?\d+(?:\/\d*\.?\d+)?))?)', sentence)    
    # (\d\sto\s\d\s(and\s\d\/\d)*) << for adding 9 to 11, couldn't fix

    result = []
    for x,y,z in matches:
        if '/' in x:
            result.append(x)
        else:
            result.extend(filter(lambda x: x!="", [y,z]))
    return result

Driver code,

extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20")

Incorrect answer:

['1 and 1/2', '5', '5', '9', '11', '9', '9 and 1/2', '11/12', '20']

Expected answer:

['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']

Please note the difference between 5 and .5, and 'x to y' and 'x to y and z'

I would appreciate any help. Thank you.

Wiktor Stribiżew · Accepted Answer

You can use

import re

def extract_values(sentence):
   num = r'\d*\.?\d+(?:/\d*\.?\d+)*'
   return re.findall(fr'{num}(?:\s+(?:and|to)\s+{num})*', sentence)

print(extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20"))
# => ['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']

See the Python demo, and the regex demo.

Details:

\d*\.?\d+(?:/\d*\.?\d+)* - a float/int number, and then zero or more occurrences of / and a float/int number
(?:\s+(?:and|to)\s+\d*\.?\d+(?:/\d*\.?\d+)*)* - zero or more occurrences of
- \s+(?:and|to)\s+ - and or to enclosed with one or more whitespaces
- \d*\.?\d+(?:/\d*\.?\d+)* - a float/int number, and then zero or more occurrences of / and a float/int number.

How to extract all numeric like values from string?

Tags:

python

regex

number-formatting

Droid-Bird

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

How to extract all numeric like values from string?

Tags:

python

regex

number-formatting

Droid-Bird

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us