Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stripping out original message from an email reply

Tags:

node.js

email

My application receives email from users. A response from gmail, for example, comes in like this:

This is some new text

On Sun, Apr 1, 2012 at 3:32 AM, My app <
[email protected]> wrote:

> Original...
> message..

Of course, this treatment varies from client to client.

Right now I am identifying the '4f77ed3860c258a567aeabf8' and throwing out everything after, because I know what email address they've sent to. This is not a general solution but works for my purposes, except for when there's a line break in the "Original message" line, like in the above example.

Is there a better, standard way to strip out past message's from a user's reply to an email?

like image 581
ty. Avatar asked Apr 01 '12 18:04

ty.


2 Answers

There is an npm module called emailreplyparser, which is ported from a github ruby library, which does this. As you point out, the formats used for this are not standard and thus any solution is going to be pretty fragile and imperfect but whaddayagonnado?

Here's an example where I take a JSON response I got from the new Gmail API and successfully access just the new reply text of a given message.

var erp = require('emailreplyparser').EmailReplyParser.read;
var message = require('./sample_message.json');
var buffer = new Buffer(message.payload.parts[0].body.data, 'base64');
var body = buffer.toString();
//body is the whole message, the new text and the quoted reply portion
// console.log(body);
var parsed = erp(body);
//this has just the text of the reply itself
console.log(parsed.fragments[0].content);

Note there may be several interesting fragments if the author interleaved reply text and quoted message fragments.

like image 57
Peter Lyons Avatar answered Oct 30 '22 22:10

Peter Lyons


If you want a 100% way to remove anything except the most recent post, compare each character from the new message and the previous one. If you don't want to write your own diff parser, check out this lib.

https://github.com/cemerick/jsdifflib

Or if you want a lightweight algo check this one out

http://ejohn.org/projects/javascript-diff-algorithm/

like image 43
FlavorScape Avatar answered Oct 31 '22 00:10

FlavorScape