Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is this Perl string encoded in?

Tags:

encoding

perl

I'm using use Mail::IMAPClient to retrieve mail headers from an imap server. It works great. But when the header contains any character other that [a-z|A-Z|0-9] I'm served with strings that look like this :

  • Subject : Un message en =?UTF-8?B?ZnJhbsOnYWlzIMOgIGxhIGNvbg==?= (original string : "Un message en français à la con")

  • Body : =C3=A9aeio=C3=B9=C3=A8=C3=A8 (original string : éaeioùèè)

    1. What is this strange format ? Is that the famous "perl string internal" format ?
    2. what is the safest way of handling human idioms coming from IMAP servers ?
like image 943
yPhil Avatar asked Nov 29 '22 15:11

yPhil


1 Answers

The body encoding is Quoted-Printable; the header (subject) encoding is MIME "encoded-word" encoding ("B" type for base64). The best way to deal with both of them is to pass the email into a module that's capable of dealing with MIME, such as Email::MIME or the older and buggier MIME::Lite.

For example:

# $message was retrieved from IMAP
my $mime = Email::MIME->new($message);
my $subject = $mime->header('Subject'); # automatically decoded
my $body = $mime->body_str; # also automatically decoded

However if you need to deal with them outside of the context of an entire message, there are also modules like Encode::MIME::Header and MIME::QuotedPrint.

like image 141
hobbs Avatar answered Dec 22 '22 08:12

hobbs