Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing inlineImages from Gmail raw content

Gmail message getAttachments function is not returning inlineImages - see issue 2810 https://code.google.com/p/google-apps-script-issues/issues/detail?id=2810

I need to do that, so I wrote the code below to parse the inline image in blob format out of the message raw content, knowing the image cid within the message, in advance.

However, I am afraid this parsing is quite fragile in the way I find the first and last character in the base64 image content, isn't it?

Is there a better way of doing this?

Regards, Fausto

var rawc = message.getRawContent();
var b64c1 = rawc.lastIndexOf(cid) + cid.length + 3; // first character in image base64
var b64cn = rawc.substr(b64c1).indexOf("--") - 3; // last character in image base64
var imgb64 = rawc.substring(b64c1, b64c1 + b64cn + 1); // is this fragile or safe enough?
var imgblob = Utilities.newBlob(Utilities.base64Decode(imgb64), "image/jpeg", cid); // decode and blob
like image 601
Fausto R. Avatar asked May 28 '13 17:05

Fausto R.


1 Answers

I've had this problem a number of times, and I think I have a pretty general case solution. Getting non-embedded images has also been a problem.

I'm not sure my parsing is any less fragile than yours. In the end, I'm sucking out the part of the multipart by grabbing the surrounding lines that start with '--'. Everything else is just making sure I can use this without modifying the code too much when I need it next. I have had some emails which don't seem follow the \r\n and cause problems: something to lookout for.

The getInlineImages function will take the raw content of the message and return an array of objects. Each object will have the src of the img tag and the blob that goes with the image. If you just want inline images, you can choose to ignore anything that doesn't start with 'cid'.

The getBlobFromMessage function will take the raw content of the message and the src of the img tag (including 'cid') and return the associated blob.

You can see the code commented here.

function getInlineImages(rawContent) {
  var url = /^https?:\/\//, cid = /^cid:/;
  var imgtags = rawContent.match(/<img.*?>(.*?<\/img>)?/gi);
  return imgtags ? imgtags.map(function(imgTag) {
    var img = {src: Xml.parse(imgTag,true).html.body.img.src};
    img.blob = url.test(img.src) ? UrlFetchApp.fetch(img.src).getBlob()
             : cid.test(img.src) ? getBlobFromMessage(rawContent,img.src)
             : null;
    return img;
  }) : [];
}

function getBlobFromMessage(rawContent,src) {
  var cidIndex = src.search(/cid:/i);
  if(cidIndex === -1) throw Utilities.formatString("Did not find cid: prefix for inline refenece: %s", src)

  var itemId = src.substr(cidIndex + 4);
  var contentIdIndex = rawContent.search("Content-ID:.*?" + itemId);
  if(contentIdIndex === -1) throw Utilities.formatString("Item with ID %s not found.",src);

  var previousBoundaryIndex = rawContent.lastIndexOf("\r\n--",contentIdIndex);
  var nextBoundaryIndex = rawContent.indexOf("\r\n--",previousBoundaryIndex+1);
  var part = rawContent.substring(previousBoundaryIndex,nextBoundaryIndex);

  var contentTransferEncodingLine = part.match(/Content-Transfer-Encoding:.*?\r\n/i)[0];
  var encoding = contentTransferEncodingLine.split(":")[1].trim();
  if(encoding != "base64") throw Utilities.formatString("Unhandled encoding type: %s",encoding);

  var contentTypeLine = part.match(/Content-Type:.*?\r\n/i)[0];
  var contentType = contentTypeLine.split(":")[1].split(";")[0].trim();

  var startOfBlob = part.indexOf("\r\n\r\n");
  var blobText = part.substring(startOfBlob).replace("\r\n",""); 
  return Utilities.newBlob(Utilities.base64Decode(blobText),contentType,itemId);
}
like image 64
fooby Avatar answered Sep 28 '22 08:09

fooby