Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Email datetime parsing with python

I am trying to parse date time of an email using python script.

In mail date value is like below when i am opening mail detils...

from:    [email protected]
to:      [email protected]
date:    Tue, Aug 28, 2012 at 1:19 PM
subject: Subject of that mail

I am using code like

mail = email.message_from_string(str1)
#to = re.sub('</br>','',mail["To"])
to = parseaddr(mail.get('To'))[1]
sender = parseaddr(mail.get('From'))[1]
cc_is = parseaddr(mail.get('Cc'))[1]
date = mail["Date"]
print date

Where as output of the same mails datetime using python parsing is like below with time offset.

Tue, 28 Aug 2012 02:49:13 -0500

Where I am Actually hoping for

Tue, Aug 28, 2012 at 1:19 PM

I am so confused between relationship of this two values. Can anybody help me to figure it out I need to get the same time into mail details.

like image 449
chirag ghiyad Avatar asked Aug 28 '12 13:08

chirag ghiyad


1 Answers

When looking at an email in GMail, your local timezone is used when displaying the date and time an email was sent. The "Tue, 28 Aug 2012 02:49:13 -0500" is parsed, then updated to your local timezone, and formatted in a GMail-specific manner.

Parsing and formatting the stdlib way

The email.utils module includes a parsedate_tz() function that specifically deals with email headers with timezone offsets.

It returns a tuple compatible with time.struct_time, but with a timezone offset added. An additional mktime_tz() function converts that tuple to an offset value (time in seconds since the UNIX epoch). This value then can be converted to a datetime.datetime() type object easily.

The same module also has a formatdate() function to convert the UNIX epoch timestamp to a email-compatible date string:

>>> from email.utils import parsedate_tz, mktime_tz, formatdate
>>> import time
>>> date = 'Tue, 28 Aug 2012 02:49:13 -0500'
>>> tt = parsedate_tz(date)
>>> timestamp = mktime_tz(tt)
>>> print formatdate(timestamp)
Tue, 28 Aug 2012 07:49:13 -0000

Now we have a formatted date in UTC suitable for outgoing emails. To have this printed as my local timezone (as determined by my computer) you need to set the localtime flag to True:

>>> print formatdate(timestamp, True)
Tue, 28 Aug 2012 08:49:13 +0100

Parsing and formatting using better tools

Note that things are getting hairy as we try and deal with timezones, and the formatdate() function doesn't give you any options to format things a little differently (like GMail does), nor does it let you choose a different timezone to work with.

Enter the external python-dateutil module; it has a parse function that can handle just about anything, and supports timezones properly

>>> import dateutil.parser
>>> dt = dateutil.parser.parse(date)
>>> dt
datetime.datetime(2012, 8, 28, 2, 49, 13, tzinfo=tzoffset(None, -18000))

The parse() function returns a datetime.datetime() instance, which makes formatting a lot easier. Now we can use the .strftime() function to output this as your email client does:

>>> print dt.strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 02:49 AM

That's still in the local timezone, of course; to cast this to your timezone instead, use the .astimezone() method, with a new tzone object. The python-dateutil package has some handy for us.

Here is how you print it in the local timezone (to your machine):

>>> import dateutil.tz
>>> print dt.astimezone(dateutil.tz.tzlocal()).strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 09:49 AM

or use a specific timezone instead:

>>> print dt.astimezone(dateutil.tz.tzstr('Asia/Kolkata')).strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 07:49 AM
like image 80
Martijn Pieters Avatar answered Sep 27 '22 17:09

Martijn Pieters