I have a rails application which processes incoming emails via IMAP. Currently a method is used that searches the parts of a TMail object for a given content_type:
def self.search_parts_for_content_type(parts, content_type = 'text/html')
parts.each do |part|
if part.content_type == content_type
return part.body
else
if part.multipart?
if body = self.search_parts_for_content_type(part.parts, content_type)
return body
end
end
end
end
return false
end
These emails are generally in response to a html email it sent out in the first place. (The original outbound email is never the same.) The body text the method above returns contains the full history of the email and I would like to just parse out the reply text.
I'm wondering whether it's reasonable to place some '---please reply above this line---' text at the top of the mail as I have seen in a 37 signals application.
Is there another way to ignore the client specific additions to the email, other than write a multitude of regular expressions (which I haven't yet attempted) for each and every mail client? They all seem to tack on their own bit at the top of any replies.
I have to do email reply parsing on a project I'm working on right now. I ended up using pattern matching to identify the response part, so users wouldn't have to worry about where to insert their reply.
The good news is that the implementation really isn't too difficult. The hard part is just testing all the different email clients and services you want to support and figuring out how to identify each one. Generally, you can use either the message ID or the X-Mailer or Return-Path header to determine where an incoming email came from.
Here's a method that takes a TMail object and extracts the response part of the message and returns that along with the email client/service it was sent from. It assumes you have the original message's From: name and address in the constants FROM_NAME
and FROM_ADDRESS
.
def find_reply(email)
message_id = email.message_id('')
x_mailer = email.header_string('x-mailer')
# For optimization, this list could be sorted from most popular to least popular email client/service
rules = [
[ 'Gmail', lambda { message_id =~ /.+gmail\.com>\z/}, /^.*#{FROM_NAME}\s+<#{FROM_ADDRESS}>\s*wrote:.*$/ ],
[ 'Yahoo! Mail', lambda { message_id =~ /.+yahoo\.com>\z/}, /^_+\nFrom: #{FROM_NAME} <#{FROM_ADDRESS}>$/ ],
[ 'Microsoft Live Mail/Hotmail', lambda { email.header_string('return-path') =~ /<.+@(hotmail|live).com>/}, /^Date:.+\nSubject:.+\nFrom: #{FROM_ADDRESS}$/ ],
[ 'Outlook Express', lambda { x_mailer =~ /Microsoft Outlook Express/ }, /^----- Original Message -----$/ ],
[ 'Outlook', lambda { x_mailer =~ /Microsoft Office Outlook/ }, /^\s*_+\s*\nFrom: #{FROM_NAME}.*$/ ],
# TODO: other email clients/services
# Generic fallback
[ nil, lambda { true }, /^.*#{FROM_ADDRESS}.*$/ ]
]
# Default to using the whole body as the reply (maybe the user deleted the original message when they replied?)
notes = email.body
source = nil
# Try to detect which email service/client sent this message
rules.find do |r|
if r[1].call
# Try to extract the reply. If we find it, save it and cancel the search.
reply_match = email.body.match(r[2])
if reply_match
notes = email.body[0, reply_match.begin(0)]
source = r[0]
next true
end
end
end
[notes.strip, source]
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With