Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mime encoded headers with extra '=' (==?utf-8?b?base64string?=)

This might be a silly question but... here it goes!

I wrote my own MIME parser in native C++. It's a nightmare with the encodings! It was stable for the last 3 months or so but recently I noticed this Subject: header.

Subject: =?UTF-8?B?T2ZpY2luYSBkZSBJbmZvcm1hY2nDs24sIEluaWNpYXRpdmFzIHkgUmVjbGFt?===?UTF-8?B?YWNpb25lcw==?=

which should decode to this:

Subject: Oficina de Información, Iniciativas y Reclamaciones

The problem is there is one extra = (equal) in there which I can't figure out binding the two (why 2?) encoded elements which I don't understand why are separated. In theory the format should be: =?charset?encoding?encoded_string?= but found another subject that starts with two =.

==?UTF-8?B?blahblahlblah?=

How should I handle the extra =?

I could replace ==? with =? (which I am) before doing anything (and it works)... but I'm wondering if there's any kind of spec regarding this so I don't hack my way into proper functionality.

PS: How much I hate these relic protocols! All text communications should be UTF-8 and XML :)

like image 964
CodeAngry Avatar asked Jun 13 '13 18:06

CodeAngry


1 Answers

In MIME headers encoded words are used (RFC 2047 Section 2.).

... (why 2?)

To overcome 75 encoded word limit, which is there because of 78 line length limit (or to use 2 different encodings like Chinese and Polish for example).

RFC 2047:

An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.

Here's the example from RFC2047 (note there is no '=' in between):

Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
  =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

Your subject should be decoded as:

"Oficina de Información, Iniciativas y Reclam=aciones"

mraq answer is incorrect. Soft line breaks apply to 'Quoted Printable' Content-Transfer-Encoding only, which can be used in MIME body.

like image 172
Pawel Lesnikowski Avatar answered Oct 18 '22 02:10

Pawel Lesnikowski