Dateutil is a great tool for parsing dates in string format. for example
from dateutil.parser import parse
parse("Tue, 01 Oct 2013 14:26:00 -0300")
returns
datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
however,
parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese
yields this error:
ValueError: unknown string format
Does anybody know how to make dateutil aware of the locale?
The calendar
module already has constants for a lot of of languages. I think the best solution is to customize the parser from dateutil using these constants. This is a simple solution and will work for a lot of languages. I didn't test it a lot, so use with caution.
Create a module localeparseinfo.py
and subclass parser.parseinfo
:
import calendar
from dateutil import parser
class LocaleParserInfo(parser.parserinfo):
WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]
Now you can use your new parseinfo object as a parameter to dateutil.parser
.
In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo
In [3]: from dateutil.parser import parse
In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
It solved my problem, but note that this is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py
, specially the parserinfo
class variables. Take a look at HMS variable and others. You'll probably be able to use other constants from the calendar module.
You can even pass the locale string as an argument to your parserinfo class.
You could use PyICU
to parse a localized date/time string in a given format:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu # PyICU
df = icu.SimpleDateFormat(
'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)
It works on Python 2/3. It does not modify global state (locale).
If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).
If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime()
as @alexwlchan suggested:
#!/usr/bin/env python3
import locale
from datetime import datetime
locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
"%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With