Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the actual email message that the person just wrote, excluding any quoted text

There are two pre-existing questions on the site. One for Python, one for Java.

  • Java How to remove the quoted text from an email and only show the new text
  • Python Reliable way to only get the email text, excluding previous emails

I want to be able to do pretty much exactly the same (in PHP). I've created a mail proxy, where two people can have a correspondance together by emailing a unique email address. The problem I am finding however, is that when a person receives the email and hits reply, I am struggling to accurately capture the text that he has written and discard the quoted text from previous correspondance.

I'm trying to find a solution that will work for both HTML emails and Plaintext email, because I am sending both.

I also have the ability if it helps to insert some <*****RESPOND ABOVE HERE*******> tag if neccessary in the emails meaning that I can discard everything below.

What would you recommend I do? Always add that tag to the HTML copy and the plaintext copy then grab everything above it?

I would still then be left with the scenario of knowing how each mail client creates the response. Because for example Gmail would do this:

On Wed, Nov 2, 2011 at 10:34 AM, Message Platform <[email protected]> wrote: ## In replies all text above this line is added to your message conversation ## 

Any suggestions or recommendations of best practices?

Or should I just grab the 50 most popular mail clients, and start creating custom Regex for each. Then for each of these clients, also a bizallion different locale settings since I'm guessing the locale of the user will also influence what is added.

Or should I just remove the preceding line always if it contains a date?.. etc

like image 777
Layke Avatar asked Nov 02 '11 10:11

Layke


People also ask

What does it mean when an email says quoted text hidden?

The quoted text that is sent along with each reply is hidden by default. Since you're already in a conversation, you don't really need it - you can expand each message in the conversation to view its content instead of having to untangle pages of quoted, indented text.

How do you quote a text message in an email?

Reply to an email using QuotesOpen Gmail, and copy the part of the email you want to reply to. This adds a gray bar, marking where you quote the original message. Next to the gray bar, paste the original message text. Press Enter and enter your response below the original message.

What does it mean by Show quoted text?

Clicking the From name or Subject Line lists each email – but only the top email will display its contents. The content of the other emails is not visible because Gmail does not detect that the “conversations” have changed overtly. To see the content, simply click “Show Quoted Text” in each email.

How do I reply to an email without the whole thread?

Replying without History To remove the entire history, From the reply, press the down arrow on your keyboard and then press Delete. This will highlight the Trimmed Content icon and remove it so that your reply will not have history included.


1 Answers

Unfortunately, you're in for a world of hurt if you want to try to clean up emails meticulously (removing everything that's not part of the actual reply email itself). The ideal way would be to, as you suggest, write up regex for each popular email client/service, but that's a pretty ridiculous amount of work, and I recommend being lazy and dumb about it.

Interestingly enough, even Facebook engineers have trouble with this problem, and Google has a patent on a method for "Detecting quoted text".

There are three solutions you might find acceptable:

Leave It Alone

The first solution is to just leave everything in the message. Most email clients do this, and nobody seems to complain. Of course, online message systems (like Facebook's 'Messages') look pretty odd if they have inception-style replies. One sneaky way to make this work okay is to render the message with any quoted lines collapsed, and include a little link to 'expand quoted text'.

Separate the Reply from the Older Message

The second solution, as you mention, is to put a delineating message at the top of your messages, like --------- please reply above this line ----------, and then strip that line and anything below when processing the replies. Many systems do this, and it's not the worst thing in the world... but it does make your email look more 'automated' and less personal (in my opinion).

Strip Out Quoted Text

The last solution is to simply strip out any new line beginning with a >, which is, presumably, a quoted line from the reply email. Most email clients use this method of indicating quoted text. Here's some regex (in PHP) that would do just that:

$clean_text = preg_replace('/(^\w.+:\n)?(^>.*(\n|$))+/mi', '', $message_body); 

There are some problems using this simpler method:

  • Many email clients also allow people to quote earlier emails, and preface those quote lines with > as well, so you'll be stripping out quotes.
  • Usually, there's a line above the quoted email with something like On [date], [person] said. This line is hard to remove, because it's not formatted the same among different email clients, and it may be one or two lines above the quoted text you removed. I've implemented this detection method, with moderate success, in my PHP Imap library.

Of course, testing is key, and the tradeoffs might be worth it for your particular system. YMMV.

like image 189
geerlingguy Avatar answered Sep 19 '22 08:09

geerlingguy