Is there a Java library that can take an email, compare it to a database of emails and find other emails that might be from the same "thread" of emails similar to mailing lists?
Probably no libraries that I know of, but you can do this by looking at the header values in the email. There are several headers in emails that are placed in there when someone replies to messages. Here are the headers discussed.
Message-ID Every email carries with it a Message-ID header which is a globally unique string of junk. Sometimes it's a GUID, but most times it's some combination of GUID + domain. The format doesn't matter it's just some unique string.
In-Reply-To In-Reply-To holds the value of the message ID in which this email is a reply to.
References May contain a list of the Message-IDs of all the messages in the chain from the current message back to the start of the thread. If the thread is very long, this list may be abbreviated in the middle, but the first and the last message should always be present. (Older mail software uses this field to identify other messages, which the current messages refers to.)
Outlook using Thread-Index in which all emails that are apart of a single thread will carry.
You can get at these headers using good old JavaMail so it shouldn't be too hard to reconstruct threads this way. Unfortunately, there isn't a standard header like Thread-Index
http://people.dsv.su.se/~jpalme/ietf/message-threading.html
StackoverFlow post on Thread-index
How does the email header field 'thread-index' work?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With