Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing MIME mail, extracting binary attachment and text conversion

Tags:

haskell

I started using mime to parse email and extract attachments. Anything I did, the binary attachment always got corrupted when I wrote it to disk. Then I realized that, for some weird reason, all the base64 attachments are already decoded when the message is parsed into data types. That's when my problem starts.

If it is an image, it does not work. The first thing I did was to convert the extracted Text attachment to ByteString with TE.encodeUtf8. No luck. I tried all Text.Encoding functions to convert Text to ByteString - nothing works. Then for some stupid reason I converted/encoded the extracted text back to base64, then again I decoded it from base64 and it worked this time. Why?

So if I encode the extracted attachment to base64 and decode it back, it works. B.writeFile "tmp/test.jpg" $ B.pack $ decode $ encodeRawString True $ T.unpack attachment Why? Why simple encoding of Text to ByteString did not work but the above silliness does?

Eventually, I played with it a bit more and got to the point when it works with Data.ByteString.Char8 like this B.writeFile "tmp/test.jpg" $ BC.pack $ T.unpack attachment So I still have to convert Text to String, then String to ByteString.Char8 and only then it works and I get uncorrupted image.

Can please someone explain all this. Why such a pain with binary image attachment? Why can't I convert base64 decoded Text to ByteString? What am I missing?

Thank you.

UPDATE

This is the code to extract the attachment as requested. I thought it was not relevant to text encoding/decoding.

import Codec.MIME.Parse
import Codec.MIME.Type
import Data.Maybe
import Data.Text (Text, unpack, strip)
import qualified Data.Text as T (null)
import Data.Text.Encoding (encodeUtf8)
import Data.ByteString (ByteString)


data Attachment = Attachment { attName :: Text
                             , attSize :: Int
                             , attBody :: Text
                             } deriving (Show)


genAttach :: Text -> [Attachment]
genAttach m =
  let prs v = if isAttach v
              then [Just (mkAttach v)]
              else case mime_val_content v of
                     Single c -> if T.null c
                                 then [Nothing]
                                 else prs $ parseMIMEMessage c
                     Multi vs -> concatMap prs vs
  in let atts = filter isJust $ prs $ parseMIMEMessage m
     in if null atts then [] else map fromJust atts

isAttach :: MIMEValue -> Bool
isAttach mv =
  maybe False check $ mime_val_disp mv
    where check d = if (dispType d) == DispAttachment then True else False

mkAttach :: MIMEValue -> Attachment
mkAttach v =
  let prms = dispParams $ fromJust $ mime_val_disp v
      Single cont = mime_val_content v
      name = check . filter isFn
        where isFn (Filename _) = True
              isFn _            = False
              check = maybe "" (\(Filename n) -> n) . listToMaybe
      size = check . filter isSz
        where isSz (Size _) = True
              isSz _        = False
              check = maybe "" (\(Size n) -> n) . listToMaybe
  in Attachment { attName = name prms
                , attSize = let s = size prms
                            in if T.null s then 0 else read $ unpack s
                , attBody = cont
                }
like image 357
r.sendecky Avatar asked Oct 19 '22 19:10

r.sendecky


1 Answers

Note that the mime package chooses to represent binary content with a Text value. The way to derive the corresponding ByteString is to latin1 encode the text. In this case it is guaranteed that all of the code points in the text string will be in the range 0 - 255.

Create a file with this content:

Content-Type: image/gif
Content-Transfer-Encoding: base64

R0lGODlhAQABAIABAP8AAP///yH5BAEAAAEALAAAAAABAAEAAAICRAEAOw==

This is the base64 encoding of the 1x1 red GIF image at http://commons.wikimedia.org/wiki/File:1x1.GIF

Here is some code which uses parseMIMEMessage to recreate this file.

import Codec.MIME.Parse
import Codec.MIME.Type

import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import qualified Data.ByteString.Char8 as BS
import System.IO

test1 path = do
  msg <- TIO.readFile path
  let mval = parseMIMEMessage msg
      Single img = mime_val_content mval
  withBinaryFile "out-io" WriteMode $ \h -> do
    hSetEncoding h latin1
    TIO.hPutStr h img

test2 path = do
  msg <- TIO.readFile path
  let mval = parseMIMEMessage msg
      Single img = mime_val_content mval
      bytes =   BS.pack $ T.unpack img
  BS.writeFile "out-bs" bytes

In test2 the latin1 encoding is accomplished with BS.pack . T.unpack.

like image 124
ErikR Avatar answered Oct 22 '22 20:10

ErikR