Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Identify Email Belongs to Existing Thread or Conversation

We have an internal .NET case management application that automatically creates a new case from an email. I want to be able to identify other emails that are related to the original email so we can prevent duplicate cases from being created.

I have observed that many, but not all, emails have a thread-index header that looks useful.

Does anybody know of a straightforward algorithm or package that we could use?

like image 490
Knobbywheels Avatar asked Nov 13 '08 23:11

Knobbywheels


People also ask

Is it an email string or thread?

An email thread is an email message that includes a running list of all the succeeding replies starting with the original email. The replies are arranged visually near the original message, usually in chronological order from the first reply to the most recent.

What is an email conversation thread?

Conversation threading is a feature used to automatically group messages with their replies. Every email contains a unique message-id, which is automatically included in the references of the email header.


2 Answers

As far as I know, there's not going to be a 100% foolproof solution, as not all email clients or gateways preserve or respect all headers.

However, you'll get a pretty high hit rate with the following:

  • Every email message should have a unique "Message-ID" field. Find this, and keep a record of it as a part of the case. (See RFC-822)

  • If you receive two messages with the same Message-ID, discard the second one as it's a duplicate.

  • Check for the "In-Reply-To" field, if the ID shown matches a known Message-ID then you know the email is related.

  • The "References" and "Original-Message-ID" headers have similar meanings.

If your system ever generates emails, include a CaseID# in the subject line in a way that you can search for it if you get an email back (eg: [Case#20081114-01]); most people don't edit subject lines when replying.

The internet standards RFC-822, RFC-2076 and RFC-4021 may be useful further reading.

Given that there will always be messages that are missed (for whatever reason), you'll also probably want related features in your case management system - say, "Close as Duplicate Case" or "Merge with Duplicate Case", along with tools to make it easier to find duplicates.

like image 167
Bevan Avatar answered Sep 29 '22 09:09

Bevan


Use the JWZ threading algorithm.

like image 40
geocar Avatar answered Sep 29 '22 08:09

geocar