Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dateutil.parser to parse a date in another language

Dateutil is a great tool for parsing dates in string format. for example

from dateutil.parser import parse
parse("Tue, 01 Oct 2013 14:26:00 -0300")

returns

datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

however,

parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese

yields this error:

ValueError: unknown string format

Does anybody know how to make dateutil aware of the locale?

like image 487
fccoelho Avatar asked Nov 12 '13 11:11

fccoelho


2 Answers

The calendar module already has constants for a lot of of languages. I think the best solution is to customize the parser from dateutil using these constants. This is a simple solution and will work for a lot of languages. I didn't test it a lot, so use with caution.

Create a module localeparseinfo.py and subclass parser.parseinfo:

import calendar
from dateutil import parser
    
class LocaleParserInfo(parser.parserinfo):
    WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
    MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]

Now you can use your new parseinfo object as a parameter to dateutil.parser.

In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo                                   

In [3]: from dateutil.parser import parse                                                

In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())              
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

It solved my problem, but note that this is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py, specially the parserinfo class variables. Take a look at HMS variable and others. You'll probably be able to use other constants from the calendar module.

You can even pass the locale string as an argument to your parserinfo class.

like image 50
neves Avatar answered Oct 21 '22 04:10

neves


You could use PyICU to parse a localized date/time string in a given format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu  # PyICU

df = icu.SimpleDateFormat(
               'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)

It works on Python 2/3. It does not modify global state (locale).

If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).

If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as @alexwlchan suggested:

#!/usr/bin/env python3
import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                        "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00
like image 30
jfs Avatar answered Oct 21 '22 04:10

jfs