Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang encode string UTF16 little endian and hash with MD5

Tags:

hash

go

md5

I am a Go beginner and stuck with a problem. I want to encode a string with UTF16 little endian and then hash it with MD5 (hexadecimal). I have found a piece of Python code, which does exactly what I want. But I am not able to transfer it to Google Go.

md5 = hashlib.md5()
md5.update(challenge.encode('utf-16le'))
response = md5.hexdigest()

The challenge is a variable containing a string.

like image 440
phips Avatar asked Dec 10 '22 20:12

phips


2 Answers

You can do it with less work (or at least more understandability, IMO) by using golang.org/x/text/encoding and golang.org/x/text/transform to create a Writer chain that will do the encoding and hashing without so much manual byte slice handling. The equivalent function:

func utf16leMd5(s string) []byte {
    enc := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewEncoder()
    hasher := md5.New()
    t := transform.NewWriter(hasher, enc)
    t.Write([]byte(s))
    return hasher.Sum(nil)
}
like image 176
hobbs Avatar answered Feb 22 '23 23:02

hobbs


You can use the unicode/utf16 package for UTF-16 encoding. utf16.Encode() returns the UTF-16 encoding of the Unicode code point sequence (slice of runes: []rune). You can simply convert a string to a slice of runes, e.g. []rune("some string"), and you can easily produce the byte sequence of the little-endian encoding by ranging over the uint16 codes and sending/appending first the low byte then the high byte to the output (this is what Little Endian means).

For Little Endian encoding, alternatively you can use the encoding/binary package: it has an exported LittleEndian variable and it has a PutUint16() method.

As for the MD5 checksum, the crypto/md5 package has what you want, md5.Sum() simply returns the MD5 checksum of the byte slice passed to it.

Here's a little function that captures what you want to do:

func utf16leMd5(s string) [16]byte {
    codes := utf16.Encode([]rune(s))
    b := make([]byte, len(codes)*2)
    for i, r := range codes {
        b[i*2] = byte(r)
        b[i*2+1] = byte(r >> 8)
    }
    return md5.Sum(b)
}

Using it:

s := "Hello, playground"
fmt.Printf("%x\n", utf16leMd5(s))

s = "エヌガミ"
fmt.Printf("%x\n", utf16leMd5(s))

Output:

8f4a54c6ac7b88936e990256cc9d335b
5f0db9e9859fd27f750eb1a212ad6212

Try it on the Go Playground.

The variant that uses encoding/binary would look like this:

for i, r := range codes {
    binary.LittleEndian.PutUint16(b[i*2:], r)
}

(Although this is slower as it creates lots of new slice headers.)

like image 34
icza Avatar answered Feb 23 '23 00:02

icza