Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting "=?UTF 8?.." (RFC 2047) to a regular string in golang

I'm using an API and it's returning something like this for other language text:

=?UTF 8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF 8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF 8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF 8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=

Is this a common format? How would I go about converting this to a regular string in golang?

Golang usually handles multiple languages well, but I'm not sure about how to go about converting.

like image 939
John Avatar asked Mar 08 '15 21:03

John


2 Answers

Aparrently your API is returning data encoded in RFC 2047 format. Basically, this defines the following:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

Which means your charset is UTF-8 (very handy, since this is Go's native character set), and your encoding is Base64. The text you have to decode is the one between the "B?" and the "?=". So all you have to do is take that text and call:

base64.StdEncoding.DecodeString(text)

to get the original UTF-8 string.

There is a decodeRFC2047Word() function in the net/mail package of the Go stdlib, supporting encodings B and Q and charsets UTF-8, US-ASCII and ISO-8859-1. Unfortunately it's not exported, but you're free to take as much inspiration from it as you need ;)

BTW: I just noticed the charset in your example strings is UTF 8, which is a bit odd, since the official name of the encoding is UTF-8.

like image 128
rob74 Avatar answered Nov 16 '22 00:11

rob74


Since Go 1.5 you can use mime.WordDecoder.DecodeHeader:

package main

import (
    "fmt"
    "mime"
)

func main() {
    dec := new(mime.WordDecoder)
    header, err := dec.DecodeHeader("=?UTF-8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF-8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF-8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF-8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=")
    if err != nil {
        panic(err)
    }
    fmt.Println(header)
    // Output: لخطوات التي تجمع بين حفظ القرآن الكريم وفهمه مما أملاه العلامة عبد الله الغديان.pdf
}

If you are using an older version of Go, you can use my replacement library: https://github.com/alexcesaro/quotedprintable

like image 43
Ale Avatar answered Nov 15 '22 23:11

Ale