Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing "From:" field of an e-mail message in Python

Tags:

python

email

I am trying to parse an RFC 5322 compliant "From: " field in an e-mail message into two parts: the display-name, and the e-mail address, in Python 2.7 (the display-name could be empty). The familiar example is something like

John Smith <[email protected]>

In above, John Smith is the display-name and [email protected] is the email address. But the following is also a valid "From: " field:

"unusual" <"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com>

In this example, the return value for display-name is

"unusual" 

and

"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com

is the email address.

You can use grammars to parse this in Perl (as explained in these questions: Using a regular expression to validate an email address and The recognizing power of “modern” regexes), but I'd like to do this in Python 2.7. I have tried using email.parser module in Python, but that module seems only to be able to separate those fields that are distinguished by a colon. So, if you do something like

from email.parser import Parser
headers = Parser().parsestr('From: "John Smith" <[email protected]>')
print headers['from'] 

it will return

"John Smith" <[email protected]> 

while if you replace the last line in the above code with

print headers['display-name']

it will return

None

I'll very much appreciate any suggestions and comments.

like image 314
user765195 Avatar asked Feb 15 '23 02:02

user765195


1 Answers

headers['display-name'] is not part of the email.parser api.

Try email.utils.parseaddr:

In [17]: email.utils.parseaddr("[email protected]")
Out[17]: ('', '[email protected]')

In [18]: email.utils.parseaddr("(John Smith) [email protected]")
Out[18]: ('John Smith', '[email protected]')

In [19]: email.utils.parseaddr("John Smith <[email protected]>")
Out[19]: ('John Smith', '[email protected]')

It also handles your unusual address:

In [21]: email.utils.parseaddr('''"unusual" <"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com>''')
Out[21]: ('unusual', '"very.(),:;<>[]".VERY."very@ "very".unusual"@strange.example.com')
like image 193
Robᵩ Avatar answered Feb 23 '23 20:02

Robᵩ