Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

While processing an email reply how can I ignore any email client specifics & the history?

I have a rails application which processes incoming emails via IMAP. Currently a method is used that searches the parts of a TMail object for a given content_type:

def self.search_parts_for_content_type(parts, content_type = 'text/html')
    parts.each do |part|
      if part.content_type == content_type
        return part.body
      else
        if part.multipart?
          if body = self.search_parts_for_content_type(part.parts, content_type)
            return body
          end
        end
      end
    end

    return false
 end

These emails are generally in response to a html email it sent out in the first place. (The original outbound email is never the same.) The body text the method above returns contains the full history of the email and I would like to just parse out the reply text.

  1. I'm wondering whether it's reasonable to place some '---please reply above this line---' text at the top of the mail as I have seen in a 37 signals application.

  2. Is there another way to ignore the client specific additions to the email, other than write a multitude of regular expressions (which I haven't yet attempted) for each and every mail client? They all seem to tack on their own bit at the top of any replies.

like image 814
tsdbrown Avatar asked May 05 '09 10:05

tsdbrown


1 Answers

I have to do email reply parsing on a project I'm working on right now. I ended up using pattern matching to identify the response part, so users wouldn't have to worry about where to insert their reply.

The good news is that the implementation really isn't too difficult. The hard part is just testing all the different email clients and services you want to support and figuring out how to identify each one. Generally, you can use either the message ID or the X-Mailer or Return-Path header to determine where an incoming email came from.

Here's a method that takes a TMail object and extracts the response part of the message and returns that along with the email client/service it was sent from. It assumes you have the original message's From: name and address in the constants FROM_NAME and FROM_ADDRESS.

def find_reply(email)
  message_id = email.message_id('')
  x_mailer = email.header_string('x-mailer')

  # For optimization, this list could be sorted from most popular to least popular email client/service
  rules = [
    [ 'Gmail', lambda { message_id =~ /.+gmail\.com>\z/}, /^.*#{FROM_NAME}\s+<#{FROM_ADDRESS}>\s*wrote:.*$/ ],
    [ 'Yahoo! Mail', lambda { message_id =~ /.+yahoo\.com>\z/}, /^_+\nFrom: #{FROM_NAME} <#{FROM_ADDRESS}>$/ ],
    [ 'Microsoft Live Mail/Hotmail', lambda { email.header_string('return-path') =~ /<.+@(hotmail|live).com>/}, /^Date:.+\nSubject:.+\nFrom: #{FROM_ADDRESS}$/ ],
    [ 'Outlook Express', lambda { x_mailer =~ /Microsoft Outlook Express/ }, /^----- Original Message -----$/ ],
    [ 'Outlook', lambda { x_mailer =~ /Microsoft Office Outlook/ }, /^\s*_+\s*\nFrom: #{FROM_NAME}.*$/ ],

    # TODO: other email clients/services

    # Generic fallback
    [ nil, lambda { true }, /^.*#{FROM_ADDRESS}.*$/ ]
  ]

  # Default to using the whole body as the reply (maybe the user deleted the original message when they replied?)
  notes = email.body
  source = nil

  # Try to detect which email service/client sent this message
  rules.find do |r|
    if r[1].call
      # Try to extract the reply.  If we find it, save it and cancel the search.
      reply_match = email.body.match(r[2])
      if reply_match
        notes = email.body[0, reply_match.begin(0)]
        source = r[0]
        next true
      end
    end
  end

  [notes.strip, source]
end
like image 191
Keith Platfoot Avatar answered Oct 22 '22 00:10

Keith Platfoot