Python - finding date in a string

Tags:

I want to be able to read a string and return the first date appears in it. Is there a ready module that I can use? I tried to write regexs for all possible date format, but it is quite long. Is there a better way to do it?

666

asked Jul 03 '11 09:07

Zvi

1 Answers

You can run a date parser on all subtexts of your text and pick the first date. Of course, such solution would either catch things that are not dates or would not catch things that are, or most likely both.

Let me provide an example that uses dateutil.parser to catch anything that looks like a date:

Click to copy

import dateutil.parser
from itertools import chain
import re

# Add more strings that confuse the parser in the list
UNINTERESTING = set(chain(dateutil.parser.parserinfo.JUMP, 
                          dateutil.parser.parserinfo.PERTAIN,
                          ['a']))

def _get_date(tokens):
    for end in xrange(len(tokens), 0, -1):
        region = tokens[:end]
        if all(token.isspace() or token in UNINTERESTING
               for token in region):
            continue
        text = ''.join(region)
        try:
            date = dateutil.parser.parse(text)
            return end, date
        except ValueError:
            pass

def find_dates(text, max_tokens=50, allow_overlapping=False):
    tokens = filter(None, re.split(r'(\S+|\W+)', text))
    skip_dates_ending_before = 0
    for start in xrange(len(tokens)):
        region = tokens[start:start + max_tokens]
        result = _get_date(region)
        if result is not None:
            end, date = result
            if allow_overlapping or end > skip_dates_ending_before:
                skip_dates_ending_before = end
                yield date


test = """Adelaide was born in Finchley, North London on 12 May 1999. She was a 
child during the Daleks' abduction and invasion of Earth in 2009. 
On 1st July 2058, Bowie Base One became the first Human colony on Mars. It 
was commanded by Captain Adelaide Brooke, and initially seemed to prove that 
it was possible for Humans to live long term on Mars."""

print "With no overlapping:"
for date in find_dates(test, allow_overlapping=False):
    print date


print "With overlapping:"
for date in find_dates(test, allow_overlapping=True):
    print date

The result from the code is, quite unsurprisingly, rubbish whether you allow overlapping or not. If overlapping is allowed, you get a lot of dates that are nowhere to be seen, and if if it is not allowed, you miss the important date in the text.

Click to copy

With no overlapping:
1999-05-12 00:00:00
2009-07-01 20:58:00
With overlapping:
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-12 00:00:00
1999-05-03 00:00:00
1999-05-03 00:00:00
1999-07-03 00:00:00
1999-07-03 00:00:00
2009-07-01 20:58:00
2009-07-01 20:58:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-01 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00
2058-07-03 00:00:00

Essentially, if overlapping is allowed:

"12 May 1999" is parsed to 1999-05-12 00:00:00
"May 1999" is parsed to 1999-05-03 00:00:00 (because today is the 3rd day of the month)

If, however, overlapping is not allowed, "2009. On 1st July 2058" is parsed as 2009-07-01 20:58:00 and no attempt is made to parse the date after the period.

116

answered Oct 15 '22 00:10

Rosh Oxymoron

Related questions
                            
                                How do I disable history in python mechanize module?
                            
                                Would it be possible to integrate Python or Perl with Ruby?
                            
                                How to add columns to sqlite3 python?
                            
                                Python urllib2 > HTTP Proxy > HTTPS request
                            
                                python floating number [duplicate]
                            
                                What's the best way to send an object over a network in Python?
                            
                                Memoization Handler
                            
                                Variable interpolation in Python [duplicate]
                            
                                What is the use of related fields in OpenERP?
                            
                                Loop through values or registry key.. _winreg Python
                            
                                Getting started with PySide [closed]
                            
                                Python library to do jQuery-like text extraction?
                            
                                Edit text using Python and curses Textbox widget?
                            
                                ImportError: dynamic module does not define init function, but it does
                            
                                remove special characters from string
                            
                                Mercurial CGI (hgweb.cgi) fails
                            
                                How do people usually implement jsonp in python? [closed]
                            
                                Python's NLTK vs. related Java Libraries? [closed]
                            
                                Attribute assignment to built-in object [duplicate]
                            
                                Python: Alternating functions every x minutes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - finding date in a string

Tags:

python

string

date

Zvi

People also ask

1 Answers

Rosh Oxymoron

Recent Activity

Donate For Us