Python email.header.decode_header fails for multiline headers

Question

I'm building a system that reads emails from a gmail account and fetches the subjects, using Python's imaplib and email modules. Sometimes, emails received from a hotmail account have line breaks in their headers, for instance:

In [4]: message['From']
Out[4]: '=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>'

If I try to decode that header, it does nothing:

In [5]: email.header.decode_header(message['From'])
Out[5]: [('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>', None)]

However, if I replace the line break and tab with a space, it works:

In [6]: email.header.decode_header(message['From'].replace('
	', ' '))
Out[6]: [('isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), ('<isatocino22@hotmail.com>', None)]

Is this a bug in decode_header? If not, I would like to know what other special cases like this I should be aware of.

Robᵩ · Accepted Answer

It is a bug in decode_header, which bug is present in python2.7 and fixed in python3.3.

>>> sys.version_info
sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0)
>>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>')
[(b'isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), (b'<isatocino22@hotmail.com>', None)]

vs

>>> sys.version_info
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
>>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>')
[('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>', None)]

Benjy Malca · Answer

This error is still happening in some Python 2.7 versions, so the following workaround could be used:

>>> email.header.decode_header('=?utf-8?B?aXNhYmVsIG1hcsOtYSB0b2Npbm8gZ2FyY8OtYQ==?=
	<isatocino22@hotmail.com>'.replace('
	', ' '))
[('isabel mar\xc3\xada tocino garc\xc3\xada', 'utf-8'), ('<isatocino22@hotmail.com>', None)]

It replaces the CLRF and the tab feed for a whitespace. With this, decode_header will parse correctly the header.

Python email.header.decode_header fails for multiline headers

Tags:

python

email

José Tomás Tocino

2 Answers

Robᵩ

Benjy Malca

Recent Activity

Donate For Us

Python email.header.decode_header fails for multiline headers

Tags:

python

email

José Tomás Tocino

2 Answers

Robᵩ

Benjy Malca

Related questions

Recent Activity

Donate For Us