Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Email subject parsing

I'm building a system for reading emails in C#. I've got a problem parsing the subject, a problem which I think is related to encoding.

The subject I'm reading is as follows: =?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?=, the original subject sent is æøsdåføsdf sdfsdf (Norwegian characters in there).

Any ideas how I can change encoding or parse this correctly? So far I've tried to use the C# encoding conversion techniques to encode the subject to utf8, but without any luck.

Here is one of the solutions I tried:

Encoding iso = Encoding.GetEncoding("iso-8859-1");
Encoding utf = Encoding.UTF8;
string decodedSubject =
    utf.GetString(Encoding.Convert(utf, iso,
                                   iso.GetBytes(m.Subject.Split('?')[3])));
like image 922
Kenneth Avatar asked Dec 09 '22 13:12

Kenneth


1 Answers

The encoding is called quoted printable.

See the answers to this question.

Adapted from the accepted answer:

public string DecodeQuotedPrintable(string value)
{
        Attachment attachment = Attachment.CreateAttachmentFromString("", value);
        return attachment.Name;
}

When passed the string =?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?= this returns "æøsdåføsdf_sdfsdf".

like image 112
Oded Avatar answered Dec 19 '22 09:12

Oded