Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Parse string, known structure

I have to parse a list of simple strings with a known structure but I'm finding it unnecessarily clunky. I feel I'm missing a trick, perhaps some simple regex that would make this trivial?

The string refers to some number of years/months in the future, I want to make this into decimal years.

Generic format: "aYbM"

Where a is the number of years, b is the number of months these can be ints and both are optional (along with their identifier)

Test cases:

5Y3M == 5.25
5Y == 5.0
6M == 0.5
10Y11M = 10.91666..
3Y14M = raise ValueError("string '%s' cannot be parsed" %input_string)

My attempts so far have involved string splitting and been pretty cumbersome though they do produce the correct results:

def parse_aYbM(maturity_code):
    maturity = 0
    if "Y" in maturity_code:
        maturity += float(maturity_code.split("Y")[0])
        if "M" in maturity_code:
            maturity += float(maturity_code.split("Y")[1].split("M")[0]) / 12
        return maturity
    elif "M" in maturity_code:
        return float(maturity_code[:-1]) / 12
    else:
        return 0 
like image 583
David258 Avatar asked Jul 05 '15 01:07

David258


1 Answers

You could use the regex pattern

(?:(\d+)Y)?(?:(\d+)M)?

which means

(?:        start a non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  Y        followed by a literal Y
)?         end the non-grouping pattern; matched 0-or-1 times
(?:        start another non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  M        followed by a literal M
)?         end the non-grouping pattern; matched 0-or-1 times 

When used in

re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()

the groups() method returns the portion of the matches inside the grouping parentheses. None is returned if the group was not matched. For example,

In [220]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '5Y3M').groups()
Out[220]: ('5', '3')

In [221]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '3M').groups()
Out[221]: (None, '3')

import re
def parse_aYbM(text):
    a, b = re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()
    if a == b == None:
        raise ValueError('input does not match aYbM')
    a, b = [int(item) if item is not None else 0 for item in (a, b)]
    return a + b/12.0

tests = [
('5Y3M', 5.25),
('5Y', 5.0),
('6M', 0.5),
('10Y11M', 10.917),
('3Y14M', 4.167),
]

for test, expected in tests:
    result = parse_aYbM(test)
    status = 'Failed'
    if abs(result - expected) < 0.001:
        status = 'Passed'
    print('{}: {} --> {}'.format(status, test, result))

yields

Passed: 5Y3M --> 5.25
Passed: 5Y --> 5.0
Passed: 6M --> 0.5
Passed: 10Y11M --> 10.9166666667
Passed: 3Y14M --> 4.16666666667

Note, it's not clear what should happen if the input to parse_aYbM does not match the pattern. With the code above a non-match raises ValueError:

In [227]: parse_aYbM('foo')
ValueError: input does not match aYbM

but a partial match may return a value:

In [229]: parse_aYbM('0Yfoo')
Out[229]: 0.0
like image 119
unutbu Avatar answered Sep 24 '22 06:09

unutbu