I have lots of dates in a column in a CSV file that I need to convert from dd/mm/yyyy to yyyy-mm-dd format. For example 17/01/2010 should be converted to 2010-01-17.
How can I do this in Perl or Python?
yyyy-mm-dd stands for year-month-day . We can convert string format to datetime by using the strptime() function. We will use the '%Y/%m/%d' format to get the string to datetime.
Your answer First, pick the cells that contain dates, then right-click and select Format Cells. Select Custom in the Number Tab, then type 'dd-mmm-yyyy' in the Type text box, then click okay. It will format the dates you specify.
You can use the POSIX function strftime() to format date and time with the help of the following table.
If you are guaranteed to have well-formed data consisting of nothing else but a singleton date in the DD-MM-YYYY format, then this works:
# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));
That works on a $date
holding "07/04/1776" but fails on "this 17/01/2010 and that 01/17/2010 there". Instead, use:
# SECOND METHOD
($ndate = $date) =~ s{
\b
( \d \d )
/ ( \d \d )
/ ( \d {4} )
\b
}{$3-$2-$1}gx;
If you prefer a more "grammatical" regex, so that itโs easier to maintain and update, you can instead use this:
# THIRD METHOD
($ndate = $date) =~ s{
(?&break)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break)
(?(DEFINE)
(?<break> \b )
(?<slash> / )
(?<year> \d {4} )
(?<month> \d {2} )
(?<day> \d {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
Finally, if you have Unicode data, you might want to be a bit more careful.
# FOURTH METHOD
($ndate = $date) =~ s{
(?&break_before)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break_after)
(?(DEFINE)
(?<slash> / )
(?<start> \A )
(?<finish> \z )
# don't really want to use \D or [^0-9] here:
(?<break_before>
(?<= [\pC\pP\pS\p{Space}] )
| (?<= \A )
)
(?<break_after>
(?= [\pC\pP\pS\p{Space}]
| \z
)
)
(?<digit> \d )
(?<year> (?&digit) {4} )
(?<month> (?&digit) {2} )
(?<day> (?&digit) {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
You can see how each of these four approaches performs when confronted with sample input strings like these:
my $sample = q(17/01/2010);
my @strings = (
$sample, # trivial case
# multiple case
"this $sample and that $sample there",
# multiple case with non-ASCII BMP code points
# U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
"from \x{201c}$sample\x{201d} through\xA0$sample",
# multiple case with non-ASCII code points
# from both the BMP and the SMP
# code point U+02013 is EN DASH, props \pP \p{Pd}
# code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
# code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
"\x{10179}$sample\x{2013}\x{110BD}$sample",
);
Now letting $date
be a foreach
iterator through that array, we get this output:
Original is: 17/01/2010
First method: 2010-01-17
Second method: 2010-01-17
Third method: 2010-01-17
Fourth method: 2010-01-17
Original is: this 17/01/2010 and that 17/01/2010 there
First method: 2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method: this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there
Original is: from โ17/01/2010โ throughย 17/01/2010
First method: 2010-01-2010โ throughย 17-01-from โ17
Second method: from โ2010-01-17โ throughย 2010-01-17
Third method: from โ2010-01-17โ throughย 2010-01-17
Fourth method: from โ2010-01-17โ throughย 2010-01-17
Original is: ๐
น17/01/2010โ๐ฝ17/01/2010
First method: 2010-01-2010โ๐ฝ17-01-๐
น17
Second method: ๐
น2010-01-17โ๐ฝ2010-01-17
Third method: ๐
น2010-01-17โ๐ฝ2010-01-17
Fourth method: ๐
น2010-01-17โ๐ฝ2010-01-17
Now letโs suppose that you actually do want to match non-ASCII digits. For example:
U+660 ARABIC-INDIC DIGIT ZERO
U+661 ARABIC-INDIC DIGIT ONE
U+662 ARABIC-INDIC DIGIT TWO
U+663 ARABIC-INDIC DIGIT THREE
U+664 ARABIC-INDIC DIGIT FOUR
U+665 ARABIC-INDIC DIGIT FIVE
U+666 ARABIC-INDIC DIGIT SIX
U+667 ARABIC-INDIC DIGIT SEVEN
U+668 ARABIC-INDIC DIGIT EIGHT
U+669 ARABIC-INDIC DIGIT NINE
or even
U+1D7F6 MATHEMATICAL MONOSPACE DIGIT ZERO
U+1D7F7 MATHEMATICAL MONOSPACE DIGIT ONE
U+1D7F8 MATHEMATICAL MONOSPACE DIGIT TWO
U+1D7F9 MATHEMATICAL MONOSPACE DIGIT THREE
U+1D7FA MATHEMATICAL MONOSPACE DIGIT FOUR
U+1D7FB MATHEMATICAL MONOSPACE DIGIT FIVE
U+1D7FC MATHEMATICAL MONOSPACE DIGIT SIX
U+1D7FD MATHEMATICAL MONOSPACE DIGIT SEVEN
U+1D7FE MATHEMATICAL MONOSPACE DIGIT EIGHT
U+1D7FF MATHEMATICAL MONOSPACE DIGIT NINE
So imagine you have a date in mathematical monospace digits, like this:
$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";
The Perl code will work just fine on that:
Original is: ๐ท๐ฝ/๐ท๐ถ/๐ธ๐ถ๐ท๐ถ
First method: ๐ธ๐ถ๐ท๐ถ-๐ท๐ถ-๐ท๐ฝ
Second method: ๐ธ๐ถ๐ท๐ถ-๐ท๐ถ-๐ท๐ฝ
Third method: ๐ธ๐ถ๐ท๐ถ-๐ท๐ถ-๐ท๐ฝ
Fourth method: ๐ธ๐ถ๐ท๐ถ-๐ท๐ถ-๐ท๐ฝ
I think youโll find that Python has a pretty brainโdamaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this.
Itโs also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...)
blocks are not supported there. Heck, Python doesnโt even support Unicode properties. Itโs just not suitable for Unicode regex work because of this.
But hey, if you think thatโs bad in Python compared to Perl (and it certainly is), just try any other language. I havenโt found one that isnโt still worse for this sort of work.
As you see, you run into real problems when you ask for regex solutions from multiple languages. First of all, the solutions are difficult to compare because of the different regex flavors. But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. This may become even more obvious once arbitrary Unicode enters the picture.
So if you just wanted Python, you should have asked for only that. Otherwise itโs a terribly unfair contest that Python will nearly always lose; itโs just too messy to get things like this correct in Python, let alone both correct and clean. Thatโs asking more of it than it can produce.
In contrast, Perlโs regexes excel at both those.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With