Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A "smart" (forgiving) date parser?

I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd.

Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (in Perl):

return "$1-$2-$3" if ($val =~ /(\d{4})[-\/]*(\d{2})[-\/]*(\d{2})/);

The problem arises when the source dates moves away from the "generic" yyyymmdd. The goal is to salvage as many dates as possible, before giving up. Example source strings include:

21/3/1998, March 2004, 2001, 3/4/97

I can try to match as many of the examples I can find with a succession of regular expressions such as the one above.

But is there something smarter to do? Am I not reinventing the wheel? Is there a library somewhere doing something similar? I couldn't find anything relevant googling "forgiving date parser". (any language is OK).

like image 762
Jean-Denis Muys Avatar asked Jul 09 '09 10:07

Jean-Denis Muys


1 Answers

Date::Manip is your friend, as is fails on only one out of four because it assumes US format, using Date_Init you can get 4 out of 4.

If you have different formats (ie, month before day and viceversa) you'd have to parse them differently, once with US date format and the next with a non-US date format. This is especially important when it's ambiguous, like your 3/4/97 example, because if it's 21/3 it just fails and you can tell the format is wrong.

vinko@mithril:~$ more date.pl
use strict;
use warnings;
use Date::Manip;

my @a;
push @a, "March 2004";
push @a, "2001";
push @a, "3/4/97";
push @a, "21/3/1998";
Date_Init("DateFormat=non-US");
for my $d (@a) {
    print "$d\n";
    print ParseDate($d)."\n";
};
vinko@mithril:~$ perl date.pl
March 2004
2004030100:00:00
2001
2001010100:00:00
3/4/97
1997040300:00:00
21/3/1998
1998032100:00:00
like image 91
Vinko Vrsalovic Avatar answered Oct 12 '22 14:10

Vinko Vrsalovic