Pandas is pretty great at parsing string dates when they are in english:
In [1]: pd.to_datetime("11 January 2014 at 10:50AM")
Out[1]: Timestamp('2014-01-11 10:50:00')
I'm wondering if there's an easy way to do the same using pandas when strings are in another language, for example in french:
In [2]: pd.to_datetime("11 Janvier 2016 à 10:50")
ValueError: Unknown string format
Ideally, there would be a way to do it directly in pd.read_csv
.
There is a module named dateparser that is capable of handling numerous languages including french, russian, spanish, dutch and over 20 more. It also can recognize stuff like time zone abbreviations, etc.
Let's confirm it works for a single date:
In [1]: import dateparser
dateparser.parse('11 Janvier 2016 à 10:50')
Out[1]: datetime.datetime(2016, 1, 11, 10, 50)
Moving on to parsing this test_dates.csv
file:
Date Value
0 7 janvier 1983 10
1 21 décembre 1986 21
2 1 janvier 2016 12
You can actually use dateparser.parse
as the parser:
In [2]: df = pd.read_csv('test_dates.csv',
parse_dates=['Date'], date_parser=dateparser.parse)
print(df)
Out [2]:
Date Value
0 1983-01-07 10
1 1986-12-21 21
2 2016-01-01 12
Obviously if you need to do that after having already loaded the dataframe, you can always use apply, or map:
# Using apply (6.22 ms per loop)
df.Date = df.Date.apply(lambda x: dateparser.parse(x))
# Or map which is slightly slower (7.75 ms per loop)
df.Date = df.Date.map(dateparser.parse)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With