Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing forwarded emails

Tags:

python

rfc

I'm writing some code to parse forwarded emails. What I'm not sure is if maybe there is some Python library, some RFC I could stick to or some other resource that would allow me to automate the task.

To be precise, I don't know if the "layout" of forwarded emails is covered by some standard or recommendation, or if it has just evolved over the years so now most email clients produce similar output for the text part:

    Begin forwarded message: 

    > From: Me <[email protected]>
    > Date: January 30, 2010 18:26:33 PM GMT+02:00
    > To: Other Me <[email protected]>
    > Subject: Unwise question

-- and go wild for attachments (and whatever other MIME sections can be there).

If it's still not precise enough I'll clarify it, it's just that I'm not 100% sure what to ask about (RFC, Python lib, convention or something else).

like image 577
Tomasz Zieliński Avatar asked Jan 30 '10 17:01

Tomasz Zieliński


People also ask

What is parsing in email?

An email parser is a type of software application used for data extraction from incoming emails. A parsing API extracts text data from the email header and body. It can also parse information directly from email file attachments like PDF documents, CSV files, and MS Office files.

Can the original sender see replies to forwarded email?

When you forward an email, does the original sender see it? If your original sender uses traditional email, it's simple. In this case, if you forward an email, the sender of the original message will never find out that you forwarded the message to another recipient.


2 Answers

Unlike what many other people said, there is a standard on forwarded emails, RFC 2046, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", more than ten years old. See specially its section 5.2, "Message Media Type".

The basic idea behind RFC 2046 is to encapsulate one message into the MIME part of another, of type named (unfortunately) message/rfc822 (never forget that MIME is recursive). The MIME library of Python can handle it fine.

I did not downvote the other answers because they are right in one respect: the standard is not followed by every mailer. For instance, the mutt mailer can forward a message in RFC 2046 format but also in a adhoc format. So, in practice, a mailer probably cannot handle only RFC 2046, it also has to parse the various others and underspecified syntaxes.

like image 154
bortzmeyer Avatar answered Sep 28 '22 21:09

bortzmeyer


In my experience just about ever email client forwards/replies differently. Typically you'll have a plain text version and a html encoded version in the mime at the bottom of the mail pack. Mail headers do have a RFC (http://www.faqs.org/rfcs/rfc2822.html "2822"), but unfortunately the content of the message body is out side the scope.

Not only do you have to contend with the mail client variance, but the variance of user preferences. As an example: Lotus Notes puts replies at the top and Thunderbird replies at the bottom. So when a Thunderbird user is replying to a Lotus Notes user's reply they might insert their reply at the top and leave their signature at the bottom.

Another pitfall maybe contending with word wrapping of replied chains.

>>>> The outer reply that goes over the limit and is word wraped by
the middle replier's mail client\n
>> The message body of a middle reply
> Previous reply
Newest reply

I wouldn't parse the message and leave it to the user to parse in their heads. Or, I'd borrow the code from another project.

like image 43
ryan v Avatar answered Sep 28 '22 21:09

ryan v