Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing e-mails that are word wrapped (Content-Type: text/plain)

Tags:

php

email

I am trying to process e-mails into my application and everything seems to work fine till I get an e-mail from a user whose mail server is enforcing a word wrap of the mail text. I know that the word wrap is part of a RFC specification, so I'm just looking for the best way to handle it to get a nicely displayed message.

Original E-mail:

Here is my main issue. When I email a message, the text is broken up rather oddly. It almost looks as though the message itself is broken. I'm not sure why this is the case though because my original email looks nothing like that.

Here is what the received e-mail looks like (marked with CRLF to show where mail server is inserting them):

Here is my main issue. When I email a message, the text is broken up rather CRLF
oddly. It almost looks as though the message itself is broken. I'm not sure CRLF
why this is the case though because my original email looks nothing like CRLF
that.

My processing code runs through the following and would then insert the result into the database.

$dirty_string = nl2br($dirty_string);
$config = HTMLPurifier_Config::createDefault();
$config->set('AutoFormat.RemoveEmpty', 'true');
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', 'true');
$config->set('HTML.Allowed', 'a[href],br,p');
$purifier = new HTMLPurifier($config);
$clean_string = $purifier->purify($dirty_string);

The following is the result that gets displayed. If the div on my page is not wide enough for the line the browser will automatically word wrap it but the line-break from nl2br() cause causes the next line to be short.

Here is my main issue. When I email a message, the text is
broken up rather
oddly. It almost looks as though the message itself is
broken. I'm not sure
why this is the case though because my original email looks
nothing like
that.

I thought that maybe I could just change double CRLF's to new paragraphs and strip all the single CRLF to concatenate the lines to a single line which word-wrap would display correctly. But if someone posts the following bullet list in an e-mail, that would break the list.

This is my List CRLF
- Item 1 CRLF
- Item 2 CRLF
etc...

Any help would greatly appreciated.

like image 400
Matt D. Avatar asked Apr 04 '12 16:04

Matt D.


1 Answers

Mail parsing is probably the quintessential example of a problem that appears simple, but is actually filled with oddball edge cases that break simple parsers. However, it's also not exactly a new problem, so there are plenty of existing solutions that work fine. Some options:

  • Plancake
  • MailParse
  • PHP Mime Mail Parser (which wraps MailParse)
  • etc.

Maybe you've already written a great parser that just needs this one little change to be perfect, but more likely you'll save yourself much time and heartache by using the already existing tools to do the job.

like image 60
blahdiblah Avatar answered Nov 07 '22 14:11

blahdiblah