I am trying to parse an RSS feed. Entries in the feed have date elements like:
<dc:date>2016-09-21T16:00:00+02:00</dc:date>
Using feedparser, I try to do:
published_time = datetime.fromtimestamp(mktime(entry.published_parsed))
But the problem is that I seem to be getting the wrong time stored in the database. In this particular case, the datetime is stored as:
2016-09-21 13:00:00
... when I would expect 14:00 - the correct UTC time.
I assume the problem is in our django settings, where we have:
TIME_ZONE = 'Europe/Berlin'
Because when I switch to:
TIME_ZONE = 'UTC'
... the datatime is stored as correct UTC time:
2016-09-21 14:00:00
Is there any way to keep the django settings as they are, but to parse and store this datetime correctly, without the django timezone setting affecting it?
EDIT: Maybe it's more clear like this...
print entry.published_parsed
published_time = datetime.fromtimestamp(mktime(entry.published_parsed))
print published_time
localized_time = pytz.timezone(settings.TIME_ZONE).localize(published_time, is_dst=None)
print localized_time
time.struct_time(tm_year=2016, tm_mon=9, tm_mday=21, tm_hour=14, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=265, tm_isdst=0)
2016-09-21 15:00:00
2016-09-21 15:00:00+02:00
feedparser's entry.published_parsed
is always a utc time tuple whatever the input time string is. To get timezone-aware datetime
object:
from datetime import datetime
utc_time = datetime(*entry.published_parsed[:6], tzinfo=utc)
where utc
is a tzinfo object such as datetime.timezone.utc
, pytz.utc
, or just your custom tzinfo (for older python versions).
You shouldn't pass utc time to mktime()
that expects a local time. Same error: Have a correct datetime with correct timezone.
Make sure USE_TZ=True
so that django uses aware datetime objects everywhere. Given a timezone-aware datetime object, django should save it to db correctly whatever your TIME_ZONE
or timezone.get_current_timezone()
are.
Have you tried using datetime.utcfromtimestamp()
instead of datetime.fromtimestamp()
?
As a secondary solution, you can get the unparsed data (I believe it's available as entry.published
?) and just use python-dateutil to parse the string, then convert it to pytz.utc
timezone like this.
>>> import pytz
>>> from dateutil import parser
>>> dt = parser.parse('2016-09-21T16:00:00+02:00')
>>> dt
datetime.datetime(2016, 9, 21, 16, 0, tzinfo=tzoffset(None, 7200))
>>> dt.astimezone(pytz.utc)
datetime.datetime(2016, 9, 21, 14, 0, tzinfo=<UTC>)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With