Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Multipart/Mixed with Multipart/Alternative body in java

I'm getting emails from a client where they have nested a multipart/alternative message inside a multipart/mixed message. When I get the body of the message it just returns the multipart/alternative level when what I really want is the text/html part which is contained in the multipart/alternative.

I've looked through the javadocs for javax.mail and I can't find a simple way to get the body of a bodypart that is itself a multipart or skip the first multipart/mixed part and go into the multipart/alternative body to read the text/html and text/plain pieces.

The email structure looks like this:

...
Content-Type: multipart/mixed; 
    boundary="----=_Part_19487_1145362154.1418138792683"

------=_Part_19487_1145362154.1418138792683
Content-Type: multipart/alternative; 
    boundary="----=_Part_19486_1391901275.1418138792683"

------=_Part_19486_1391901275.1418138792683
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=ISO-8859-1

...

------=_Part_19486_1391901275.1418138792683
Content-Transfer-Encoding: 7bit
Content-Type: text/html; charset=ISO-8859-1

...

------=_Part_19486_1391901275.1418138792683--

------=_Part_19487_1145362154.1418138792683--

This is an outline of the code used to parse the emails:

Message [] found = fldr.search(searchCondition);           
for (int i = 0; i < found.length; i++) {
    Message m = found[i];
    Object o = m.getContent();
    if (o instanceof Multipart) {
        log.info("**This is a Multipart Message.  ");
        Multipart mp = (Multipart)o;
        log.info("The Multipart message has " + mp.getCount() + " parts.");
        for (int j = 0; j < mp.getCount(); j++) {
            BodyPart b = mp.getBodyPart(j);

            // Loop if the content type is multipart then get the content that is in that part,
            // make it the new container and restart the loop in that part of the message.
            if (b.getContentType().contains("multipart")) {
                mp = (Multipart)b.getContent();
                j = 0;
                continue;
            }

            log.info("This content type is " + b.getContentType());

            if(!b.getContentType().contains("text/html")) {
                continue;
            }

            Object o2 = b.getContent();
            if (o2 instanceof String) {
                <do things with content here>
            }
        }
    }
}

It appears to keep stopping at the second boundary and not parsing anything further. In the case of the above message it stops at boundary="----=_Part_19486_1391901275.1418138792683" and never gets to the text of the message.

like image 479
NGittlen Avatar asked Dec 09 '14 19:12

NGittlen


2 Answers

In this block :

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = 0;
    continue;
}

You set j to 0 and ask the loop to continue, hoping it will start again at zero. But the increment operation j++ will come before and your loop will start at 1, not 0.

Set j to -1 to solve your issue.

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = -1;
    continue;
}
like image 67
ToYonos Avatar answered Sep 19 '22 12:09

ToYonos


I have tested your code and failed for me as well.

In my case, b.getContentType() returns all uppercase characters (e.g. "TEXT/HTML; charset=UTF-8"). So I have converted that to lowercase and it worked.

String contentType=b.getContentType().toLowerCase(Locale.ENGLISH);

if(!contentType.contains("text/html")) {
   continue;
}
like image 31
Serdar Basegmez Avatar answered Sep 18 '22 12:09

Serdar Basegmez