Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex extract different parts of a string in consistent order

Tags:

python

regex

I have a list of strings

my_strings = [
    "2002-03-04 with Matt",
    "Important: 2016-01-23 with Mary",
    "with Tom on 2015-06-30",
]

I want to extract:

  • date (always in yyyy-mm-dd format)
  • person (always in with person) but I don't want to keep "with"

I could do:

import re
pattern = r'.*(\d{4}-\d{2}-\d{2}).*with \b([^\b]+)\b.*'
matched = [re.match(pattern, x).groups() for x in my_strings]

but it fails because pattern doesn't match "with Tom on 2015-06-30".

Questions

How do I specify the regex pattern to be indifferent to the order in which date or person appear in the string?

and

How do I ensure that the groups() method returns them in the same order every time?

I expect the output to look like this?

[('2002-03-04', 'Matt'), ('2016-01-23', 'Mary'), ('2015-06-30', 'Tom')]
like image 833
piRSquared Avatar asked Dec 25 '22 05:12

piRSquared


1 Answers

What about doing it with 2 separate regex?

my_strings = [
    "2002-03-04 with Matt",
    "Important: 2016-01-23 with Mary",
    "with Tom on 2015-06-30",
]
import re

pattern = r'.*(\d{4}-\d{2}-\d{2})'
dates = [re.match(pattern, x).groups()[0] for x in my_strings]

pattern = r'.*with (\w+).*'
persons = [re.match(pattern, x).groups()[0] for x in my_strings]

output = zip(dates, persons)
print output
## [('2002-03-04', 'Matt'), ('2016-01-23', 'Mary'), ('2015-06-30', 'Tom')]
like image 156
Julien Spronck Avatar answered Dec 26 '22 19:12

Julien Spronck