Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

1 Answers

If you are guaranteed to have well-formed data consisting of nothing else but a singleton date in the DD-MM-YYYY format, then this works:

# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));

That works on a $date holding "07/04/1776" but fails on "this 17/01/2010 and that 01/17/2010 there". Instead, use:

# SECOND METHOD
($ndate = $date) =~ s{
    \b
      ( \d \d   )
    / ( \d \d   )
    / ( \d {4}  )
    \b
}{$3-$2-$1}gx;

If you prefer a more "grammatical" regex, so that it’s easier to maintain and update, you can instead use this:

# THIRD METHOD
($ndate = $date) =~ s{
    (?&break)

              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )

    (?&break)

    (?(DEFINE)
        (?<break> \b     )
        (?<slash> /      )
        (?<year>  \d {4} )
        (?<month> \d {2} )
        (?<day>   \d {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

Finally, if you have Unicode data, you might want to be a bit more careful.

# FOURTH METHOD
($ndate = $date) =~ s{
    (?&break_before)
              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )
    (?&break_after)

    (?(DEFINE)
        (?<slash>     /                  )
        (?<start>     \A                 )
        (?<finish>    \z                 )

        # don't really want to use \D or [^0-9] here:
        (?<break_before>
           (?<= [\pC\pP\pS\p{Space}] )
         | (?<= \A                )
        )
        (?<break_after>
            (?= [\pC\pP\pS\p{Space}]
              | \z
            )
        )
        (?<digit> \d            )
        (?<year>  (?&digit) {4} )
        (?<month> (?&digit) {2} )
        (?<day>   (?&digit) {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

You can see how each of these four approaches performs when confronted with sample input strings like these:

my $sample  = q(17/01/2010);
my @strings =  (
    $sample,  # trivial case

    # multiple case
    "this $sample and that $sample there",

    # multiple case with non-ASCII BMP code points
    # U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
    "from \x{201c}$sample\x{201d} through\xA0$sample",

    # multiple case with non-ASCII code points
    #   from both the BMP and the SMP 
    # code point U+02013 is EN DASH, props \pP \p{Pd}
    # code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
    # code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
    "\x{10179}$sample\x{2013}\x{110BD}$sample",
);

Now letting $date be a foreach iterator through that array, we get this output:

Original is:   17/01/2010
First method:  2010-01-17
Second method: 2010-01-17
Third method:  2010-01-17
Fourth method: 2010-01-17

Original is:   this 17/01/2010 and that 17/01/2010 there
First method:  2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method:  this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there

Original is:   from “17/01/2010” through 17/01/2010
First method:  2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method:  from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17

Original is:   𐅹17/01/2010–𑂽17/01/2010
First method:  2010-01-2010–𑂽17-01-𐅹17
Second method: 𐅹2010-01-17–𑂽2010-01-17
Third method:  𐅹2010-01-17–𑂽2010-01-17
Fourth method: 𐅹2010-01-17–𑂽2010-01-17

Now let’s suppose that you actually do want to match non-ASCII digits. For example:

   U+660  ARABIC-INDIC DIGIT ZERO
   U+661  ARABIC-INDIC DIGIT ONE
   U+662  ARABIC-INDIC DIGIT TWO
   U+663  ARABIC-INDIC DIGIT THREE
   U+664  ARABIC-INDIC DIGIT FOUR
   U+665  ARABIC-INDIC DIGIT FIVE
   U+666  ARABIC-INDIC DIGIT SIX
   U+667  ARABIC-INDIC DIGIT SEVEN
   U+668  ARABIC-INDIC DIGIT EIGHT
   U+669  ARABIC-INDIC DIGIT NINE

or even

 U+1D7F6  MATHEMATICAL MONOSPACE DIGIT ZERO
 U+1D7F7  MATHEMATICAL MONOSPACE DIGIT ONE
 U+1D7F8  MATHEMATICAL MONOSPACE DIGIT TWO
 U+1D7F9  MATHEMATICAL MONOSPACE DIGIT THREE
 U+1D7FA  MATHEMATICAL MONOSPACE DIGIT FOUR
 U+1D7FB  MATHEMATICAL MONOSPACE DIGIT FIVE
 U+1D7FC  MATHEMATICAL MONOSPACE DIGIT SIX
 U+1D7FD  MATHEMATICAL MONOSPACE DIGIT SEVEN
 U+1D7FE  MATHEMATICAL MONOSPACE DIGIT EIGHT
 U+1D7FF  MATHEMATICAL MONOSPACE DIGIT NINE

So imagine you have a date in mathematical monospace digits, like this:

$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";

The Perl code will work just fine on that:

Original is:   𝟷𝟽/𝟷𝟶/𝟸𝟶𝟷𝟶
First method:  𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Second method: 𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Third method:  𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Fourth method: 𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽

I think you’ll find that Python has a pretty brain‐damaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this.

It’s also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...) blocks are not supported there. Heck, Python doesn’t even support Unicode properties. It’s just not suitable for Unicode regex work because of this.

But hey, if you think that’s bad in Python compared to Perl (and it certainly is), just try any other language. I haven’t found one that isn’t still worse for this sort of work.

As you see, you run into real problems when you ask for regex solutions from multiple languages. First of all, the solutions are difficult to compare because of the different regex flavors. But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. This may become even more obvious once arbitrary Unicode enters the picture.

So if you just wanted Python, you should have asked for only that. Otherwise it’s a terribly unfair contest that Python will nearly always lose; it’s just too messy to get things like this correct in Python, let alone both correct and clean. That’s asking more of it than it can produce.

In contrast, Perl’s regexes excel at both those.

answered Sep 25 '22 02:09

tchrist

Related questions
                            
                                Deleting files which start with a name Python
                            
                                HTTP basic authentication not working in python 3.4
                            
                                Uses of Python's "from" keyword?
                            
                                Flask - wtforms: Validation always false
                            
                                Why is IronPython faster than the Official Python Interpreter
                            
                                How do I select from multiple tables in one query with Django?
                            
                                Image aspect ratio using Reportlab in Python
                            
                                Does paramiko close ssh connection on a non-paramiko exception
                            
                                How to manage division of huge numbers in Python?
                            
                                SSLError installing with pip
                            
                                Best way to iterate through all rows in a DB-table
                            
                                Cannot import name 'MappingProxyType' error after importing functools
                            
                                Append the same value multiple times to a list [duplicate]
                            
                                pip install dependency links
                            
                                Remove Part of String Before the Last Forward Slash
                            
                                Can the execution of statements in Python be delayed?
                            
                                How to kill all instance of uwsgi
                            
                                Tensorflow Assign requires shapes of both tensors to match. lhs shape= [20] rhs shape= [48]
                            
                                pylint no member issue but code still works vscode
                            
                                How to remove tags from a string in python using regular expressions? (NOT in HTML)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

Tags:

python

date

perl

text-processing

FunLovinCoder

People also ask

1 Answers

tchrist

Recent Activity

Donate For Us