Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle multipart/alternative mail with JavaMail?

I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.

After the emails are put in the List, I am doing some stuff with the subject and content of said emails. This works all fine for e-mails without an attachment. But when I started to use e-mails with attachments it all didn't work as expected anymore.

This is my code:

public void getInhoud(Message msg) throws IOException {
    try {
        cont = msg.getContent();
    } catch (MessagingException ex) {
        Logger.getLogger(ReadMailNew.class.getName()).log(Level.SEVERE, null, ex);
    }
    if (cont instanceof String) {
        String body = (String) cont;


    } else if (cont instanceof Multipart) {
        try {
            Multipart mp = (Multipart) msg.getContent();
            int mp_count = mp.getCount();
            for (int b = 0; b < 1; b++) {
                    dumpPart(mp.getBodyPart(b));
            }
        } catch (Exception ex) {
            System.out.println("Exception arise at get Content");
            ex.printStackTrace();
        }
    }
}

public void dumpPart(Part p) throws Exception {
    email = null;
    String contentType = p.getContentType();
    System.out.println("dumpPart" + contentType);
    InputStream is = p.getInputStream();
    if (!(is instanceof BufferedInputStream)) {
        is = new BufferedInputStream(is);
    }
    int c;
    final StringWriter sw = new StringWriter();
    while ((c = is.read()) != -1) {
        sw.write(c);
    }

    if (!sw.toString().contains("<div>")) {
        mpMessage = sw.toString();
        getReferentie(mpMessage);
    }
}

The content from the e-mail is stored in a String.

This code works all fine when I try to read mails without attachment. But if I use an e-mail with attachment the String also contains HTML code and even the attachment coding. Eventually I want to store the attachment and the content of an e-mail, but my first priority is to get just the text without any HTML or attachment coding.

Now I tried an different approach to handle the different parts:

public void getInhoud(Message msg) throws IOException {
    try {
        Object contt = msg.getContent();

        if (contt instanceof Multipart) {
            System.out.println("Met attachment");
            handleMultipart((Multipart) contt);
        } else {
            handlePart(msg);
            System.out.println("Zonder attachment");

        }
    } catch (MessagingException ex) {
        ex.printStackTrace();
    }
}

public static void handleMultipart(Multipart multipart)
        throws MessagingException, IOException {
    for (int i = 0, n = multipart.getCount(); i < n; i++) {
        handlePart(multipart.getBodyPart(i));
        System.out.println("Count "+n);
    }
}

 public static void handlePart(Part part)
        throws MessagingException, IOException {

    String disposition = part.getDisposition();
    String contentType = part.getContentType();
    if (disposition == null) { // When just body
        System.out.println("Null: " + contentType);
        // Check if plain
        if ((contentType.length() >= 10)
                && (contentType.toLowerCase().substring(
                0, 10).equals("text/plain"))) {
            part.writeTo(System.out);
        } else if ((contentType.length() >= 9)
                && (contentType.toLowerCase().substring(
                0, 9).equals("text/html"))) {
            part.writeTo(System.out);
        } else if ((contentType.length() >= 9)
                && (contentType.toLowerCase().substring(
                0, 9).equals("text/html"))) {
            System.out.println("Ook html gevonden");
            part.writeTo(System.out);
        }else{
            System.out.println("Other body: " + contentType);
            part.writeTo(System.out);
        }
    } else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
        System.out.println("Attachment: " + part.getFileName()
                + " : " + contentType);
    } else if (disposition.equalsIgnoreCase(Part.INLINE)) {
        System.out.println("Inline: "
                + part.getFileName()
                + " : " + contentType);
    } else {
        System.out.println("Other: " + disposition);
    }
}

This is what is returned from the System.out.printlns

Null: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Other body: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Content-Type: multipart/alternative; boundary="047d7b6220720b499504ce3786d7"

--047d7b6220720b499504ce3786d7
Content-Type: text/plain; charset="ISO-8859-1"

'Text of the message here in normal text'

--047d7b6220720b499504ce3786d7
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

'HTML code of the message'

This approach returns the normal text of the e-mail but also the HTML coding of the mail. I really don't understand why this happens, I've googled it but it seems like there is no one else with this problem.

Any help is appreciated,

Thanks!

like image 731
Jef Avatar asked Nov 11 '12 14:11

Jef


2 Answers

I found reading e-mail with the JavaMail library much more difficult than expected. I don't blame the JavaMail API, rather I blame my poor understanding of RFC-5322 -- the official definition of Internet e-mail.

As a thought experiment: Consider how complicated an e-mail message can become in the real world. It is possible to "infinitely" embed messages within messages. Each message itself may have multiple attachments (binary or human-readable text). Now imagine how complicated this structure becomes in the JavaMail API after parsing.

A few tips that may help when traversing e-mail with JavaMail:

  • Message and BodyPart both implement Part.
  • MimeMessage and MimeBodyPart both implement MimePart.
  • Where possible, treat everything as a Part or MimePart. This will allow generic traversal methods to be built more easily.

These Part methods will help to traverse:

  • String getContentType(): Starts with the MIME type. You may be tempted to treat this as a MIME type (with some hacking/cutting/matching), but don't. Better to only use this method inside the debugger for inspection.
    • Oddly, MIME type cannot be extracted directly. Instead use boolean isMimeType(String) to match. Read docs carefully to learn about powerful wildcards, such as "multipart/*".
  • Object getContent(): Might be instanceof:
    • Multipart -- container for more Parts
      • Cast to Multipart, then iterate as zero-based index with int getCount() and BodyPart getBodyPart(int)
        • Note: BodyPart implements Part
      • In my experience, Microsoft Exchange servers regularly provide two copies of the body text: plain text and HTML.
        • To match plain text, try: Part.isMimeType("text/plain")
        • To match HTML, try: Part.isMimeType("text/html")
    • Message (implements Part) -- embedded or attached e-mail
    • String (just the body text -- plain text or HTML)
      • See note above about Microsoft Exchange servers.
    • InputStream (probably a BASE64-encoded attachment)
  • String getDisposition(): Value may be null
    • if Part.ATTACHMENT.equalsIgnoreCase(getDisposition()), then call getInputStream() to get raw bytes of the attachment.

Finally, I found the official Javadocs exclude everything in the com.sun.mail package (and possibly more). If you need these, read the code directly, or generate the unfiltered Javadocs by downloading the source and running mvn javadoc:javadoc in the mail project module of the project.

like image 82
kevinarpe Avatar answered Oct 02 '22 16:10

kevinarpe


Did you find these JavaMail FAQ entries?

  • How do I read a message with an attachment and save the attachment?
  • How do I tell if a message has attachments?
  • How do I find the main message body in a message that has attachments?
like image 37
Bill Shannon Avatar answered Oct 02 '22 14:10

Bill Shannon