Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is hex -> base64 so different from base64 -> hex using pack and unpack?

I got this code working, which converts from hex to base64, and vice versa. I got to_base64 from another SO question, and I wrote to_hex with some guesswork and trial and error.

class String

  def to_base64
    [[self].pack("H*")].pack("m0")
  end

  def to_hex
    self.unpack("m0").first.unpack("H*").first
  end
end

But I don't really grok the pack and unpack methods, even after reading the docs. Specifically, I'm confused by the asymmetry between the two implementations. Conceptually, in both cases, we take a string encoded in some base (16 or 64), and we wish to convert it to another base. So why can't we implement to_hex like this:

def to_hex
  [[self].pack("m0")].pack("H*")
end

or to_base64 using unpack? Why does the base we chose completely change the method we need to use to accomplish conversions?

like image 480
Jonah Avatar asked Sep 20 '13 18:09

Jonah


People also ask

What is the difference between base64 and hex?

The difference between Base64 and hex is really just how bytes are represented. Hex is another way of saying "Base16". Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it's more efficient than hex.

Should I use hex or base64?

Base64 is more efficient than hex, while hex allows developers to easily see the value of the encoded bytes. The value of the bytes as well as the amount of bytes are just easier to see in hex; the amount of stored bytes is for instance simply half of the displayed hex digits.

Is base64 encoding unique?

The short answer is yes, unique binary/hex values will always encode to a unique base64 encoded string. BUT, multiple base64 encoded strings may represent a single binary/hex value. This is because hex bytes are not aligned with base64 'digits'.

How do I know if a string is base64 encoded?

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /] . If the rest length is less than 4, the string is padded with '=' characters. ^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

What's the point of base64 encoding?

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with ASCII. This is to ensure that the data remain intact without modification during transport.


1 Answers

to_hex is the exact inverse of to_base64:

to_base64

  1. put string in an array: [self]
  2. call pack with H*: [self].pack("H*")
  3. put string in an array: [[self].pack("H*")]
  4. call pack with m0: [[self].pack("H*")].pack("m0")

to_hex

  1. call unpack with m0: self.unpack("m0")
  2. extract string from array: self.unpack("m0").first
  3. call unpack with H*: self.unpack("m0").first.unpack("H*")
  4. extract string from array: self.unpack("m0").first.unpack("H*").first

That's how you undo operations, by applying the inverse operations:

a = 5
(a + 4) * 3
#=> 27

And the other way around:

a = 27
(a / 3) - 4
#=> 5

a.pack is the inverse of a.unpack and a.first is the inverse of [a]

like image 97
Stefan Avatar answered Sep 21 '22 17:09

Stefan