want to convert the following raw mail subject to normal UTF-8 text: <blockquote> =?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?= </blockquote> The real text for that is: <blockquote> Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet </blockquote> My first approach to convert this: <pre class="prettyprint"><code>$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?='; mb_internal_encoding("UTF-8"); echo mb_decode_mimeheader($mime); </code></pre> This gives me the following result: <blockquote> Schuker_hat_sich_vom_Übungsabend_(01.01.2012)_abgemeldet </blockquote> (Questions here: What am I doing wrong? Why do those underscores occur?) My second approach to convert this: <pre class="prettyprint"><code>$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?='; echo imap_utf8($mime); </code></pre> This gives me the following (correct) result: <blockquote> Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet </blockquote> Why does this work? On which method should I rely on? The reason I ask is that I previously asked another mail subject decoding related question where <code>mb_decode_mimeheader</code> was the solution whereas here <code>imap_utf8</code> would be the way to go. How can I ensure to decode everything correct for those both examples: <blockquote> =?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet? </blockquote> and <blockquote> =?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?= =?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?= </blockquote> Should give me the expected results: <blockquote> Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet </blockquote> and <blockquote> Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľ&scaron;čťžýáíé!ň" </blockquote>

Based on the hbit response, I've improved the <code>imapUtf8()</code> function to convert the subject text to UTF-8 using the charset information. The result is something like: <pre class="prettyprint"><code>function imapUtf8($str){ $convStr = ''; $subLines = preg_split('/[\r\n]+/', $str); for ($i=0; $i < count($subLines); $i++) { $convLine = ''; $linePartArr = imap_mime_header_decode($subLines[$i]); for ($j=0; $j < count($linePartArr); $j++) { if ($linePartArr[$j]->charset === 'default') { if ($linePartArr[$j]->text != " ") { $convLine .= ($linePartArr[$j]->text); } } else { $convLine .= iconv($linePartArr[$j]->charset, 'UTF-8', $linePartArr[$j]->text); } } $convStr .= $convLine; } return $convStr; } </code></pre>

This function works for both examples: <pre class="prettyprint"><code>function imapUtf8($str){ $convStr = ''; $subLines = preg_split('/[\r\n]+/',$str); // split multi-line subjects for($i=0; $i < count($subLines); $i++){ // go through lines $convLine = ''; $linePartArr = imap_mime_header_decode(trim($subLines[$i])); // split and decode by charset for($j=0; $j < count($linePartArr); $j++){ $convLine .= ($linePartArr[$j]->text); // append sub-parts of line together } $convStr .= $convLine; // append to whole subject } return $convStr; // return converted subject } </code></pre> Tests: <pre class="prettyprint"><code>$sub1 = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?='; $sub2 = '=?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?= =?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?='; echo imapUtf8($sub1); echo imapUtf8($sub2); </code></pre> Result: <blockquote> Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľ&scaron;čťžýáíé!ň" </blockquote>

It's also in the comments in the manual for <code>mb_decode_mimeheader</code>, and I actually assume it is a bug. None in the database, so I'd file it as a new one. However, AFAIK <code>imap_mime_header_decode</code> will cope with both your encodings without a problem, so that will keep your code going.

Convert inline specified UTF-8 mail subject

Tags:

php

email

encoding

utf-8

want to convert the following raw mail subject to normal UTF-8 text:

=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?=

The real text for that is:

Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet

My first approach to convert this:

Click to copy

$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?=  =?utf-8?Q?eldet?=';
mb_internal_encoding("UTF-8");
echo mb_decode_mimeheader($mime);

This gives me the following result:

Schuker_hat_sich_vom_Übungsabend_(01.01.2012)_abgemeldet

(Questions here: What am I doing wrong? Why do those underscores occur?)

My second approach to convert this:

Click to copy

$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?=  =?utf-8?Q?eldet?=';
echo imap_utf8($mime);

This gives me the following (correct) result:

Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet

Why does this work? On which method should I rely on?

The reason I ask is that I previously asked another mail subject decoding related question where mb_decode_mimeheader was the solution whereas here imap_utf8 would be the way to go. How can I ensure to decode everything correct for those both examples:

=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?

and

=?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?= =?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?=

Should give me the expected results:

Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet

and

Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľščťžýáíé!ň"

916

asked Feb 19 '12 16:02

hbit

4 Answers

Based on the hbit response, I've improved the imapUtf8() function to convert the subject text to UTF-8 using the charset information. The result is something like:

Click to copy

function imapUtf8($str){
    $convStr = '';
    $subLines = preg_split('/[\r\n]+/', $str);
    for ($i=0; $i < count($subLines); $i++) {
        $convLine = '';
        $linePartArr = imap_mime_header_decode($subLines[$i]);
        for ($j=0; $j < count($linePartArr); $j++) {
            if ($linePartArr[$j]->charset === 'default') {
                if ($linePartArr[$j]->text != " ") {
                    $convLine .= ($linePartArr[$j]->text);
                }
            } else {
                $convLine .= iconv($linePartArr[$j]->charset, 'UTF-8', $linePartArr[$j]->text);
            }
        }
        $convStr .= $convLine;
    }

    return $convStr;
}

answered Oct 06 '22 07:10

Gabriel Gcia Fdez

This function works for both examples:

Click to copy

function imapUtf8($str){
    $convStr = '';
    $subLines = preg_split('/[\r\n]+/',$str); // split multi-line subjects
    for($i=0; $i < count($subLines); $i++){ // go through lines
        $convLine = '';
        $linePartArr = imap_mime_header_decode(trim($subLines[$i])); // split and decode by charset
        for($j=0; $j < count($linePartArr); $j++){
            $convLine .= ($linePartArr[$j]->text); // append sub-parts of line together
        }
        $convStr .= $convLine; // append to whole subject
    }
    return $convStr; // return converted subject
}

Tests:

Click to copy

$sub1 = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?=  =?utf-8?Q?eldet?=';
$sub2 = '=?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?= =?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?=';
echo imapUtf8($sub1);
echo imapUtf8($sub2);

Result:

Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet

Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľščťžýáíé!ň"

answered Oct 06 '22 09:10

hbit

It's also in the comments in the manual for mb_decode_mimeheader, and I actually assume it is a bug. None in the database, so I'd file it as a new one.

However, AFAIK imap_mime_header_decode will cope with both your encodings without a problem, so that will keep your code going.

answered Oct 06 '22 08:10

Wrikken

About the mysterious underscore in the Subject header field:

RFC2047 4.2(2) states explicitly:

The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be represented as "_" (underscore, ASCII 95.). (This character may not pass through some internetwork mail gateways, but its use will greatly enhance readability of "Q" encoded data with mail readers that do not support this encoding.) Note that the "_" always represents hexadecimal 20, even if the SPACE character occupies a different code position in the character set in use.

The encoding rule for Subject line is documented in the very RFC2047 .

answered Oct 06 '22 09:10

Jimm Chen

Related questions
                            
                                Parse PHP code to extract function names?
                            
                                Magento, getSubtotal and getGrandTotal always return zero
                            
                                Getting all children for a deep multidimensional array
                            
                                PHP function as parameter default
                            
                                PHP mt_rand() function
                            
                                printing process output in realtime
                            
                                check if file exist in folder
                            
                                opencart - How to manually display a module inside a template file?
                            
                                Changes to prestashop theme not reflecting [closed]
                            
                                PHP exception is caught, but error message still appears
                            
                                empty value inserted when parameter is false
                            
                                PHP explode string using a regular expression
                            
                                How to test 500 Trillion combinations in less than 6 hours of execution time [closed]
                            
                                Validating decimals in symfony 2
                            
                                Proper Way to Document Class in Netbeans PHP
                            
                                Is returning error objects in PHP bad habit?
                            
                                Codeigniter View and echo
                            
                                "REPLACE INTO" versus INSERT [IF]
                            
                                mysql select count with another select count
                            
                                Save pdf to local server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert inline specified UTF-8 mail subject

Tags:

php

email

encoding

utf-8

hbit

People also ask

4 Answers

Gabriel Gcia Fdez

hbit

Wrikken

Jimm Chen

Recent Activity

Donate For Us