Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Base62 hash of a string

I want do something like fingerprint = Digest::SHA256.base64digest(str) but for base62 instead of base64. How can I efficiently build a unique base62-encoded string hash of any string?

like image 595
mahemoff Avatar asked May 28 '14 13:05

mahemoff


People also ask

What is base62 vs Base64?

Base 64 encoding is dead simple: Take 6 bits, map them onto a character, repeat until done. This is so simple, because 64 is a power of 2. With base 62 however, you will have to convert to an integer and start carrying over the "remainder" with each step, because the patterns do not fit evenly.

How many bits is base 62?

62 in binary is 111110. Unlike the decimal number system where we use the digits 0 to 9 to represent a number, in a binary system, we use only 2 digits that are 0 and 1 (bits). We have used 6 bits to represent 62 in binary. In this article, let us learn how to convert the decimal number 62 to binary.

How do I decode base62?

How to decrypt Base62 cipher? Take the base62 string and note the indexes of each character to obtain a base 62 number. Convert this number into binary or directly encode it in ASCII (or Unicode) to obtain the plain message.

How does base62 encode work?

The base62 encoding scheme uses 62 characters. The characters consist of the capital letters A-Z, the lower case letters a-z and the numbers 0–9. It is a binary-to-text encoding schemes that represent binary data in an ASCII string format.


1 Answers

Base 64 is widely used to encode binary data, because 6 bits exactly fit one character, but there are still enough printable ASCII characters to represent all of the possible patterns. In other words, the 64 available characters represent all 64 different bit patterns from decimal 0 up to decimal 63.

There are several problems with encoding binary data as base 62, based on the fact that an alphabet of size 62 just isn't a good fit. You could just map the binary data from the digest algorithm into 32-bit chunks and then assign each of these 5-bit chunks to a character. However that means that the characters above "v" will not be used any more, so you would essentially end up with a base 32 encoding.

In terms of efficiency, base 62 will never even come close to base64. Base 64 encoding is dead simple: Take 6 bits, map them onto a character, repeat until done. This is so simple, because 64 is a power of 2. With base 62 however, you will have to convert to an integer and start carrying over the "remainder" with each step, because the patterns do not fit evenly.

So my advice, which you may not like, is to use a different encoding.

--

If you need a url safe encoding you can for example use one of these:

# sample string
str = 'foo'

# original base 64 method for comparison
Digest::SHA256.base64digest(str)
#=> "LCa0a2j/xo/5m0U8HTBBNBNCLXBkg7+g+YpeiGJm564="

# url safe variant (no slash or plus characters)
Base64.urlsafe_encode64(Digest::SHA256.digest(str))
#=> "LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564="

# hexadecimal (base 16)
Digest::SHA256.hexdigest(str)
#=> "2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"

# or base 32
# gem install base32
require 'base32'
Base32.encode(Digest::SHA256.digest(str))
#=> "FQTLI23I77DI76M3IU6B2MCBGQJUELLQMSB37IHZRJPIQYTG46XA===="

# or with direct url encoding
# not pretty, but url safe!
require 'open-uri'
URI::encode(Digest::SHA256.digest(str))
#=> ",&%B4kh%FF%C6%8F%F9%9BE%3C%1D0A4%13B-pd%83%BF%A0%F9%8A%5E%88bf%E7%AE"

# or url url escaped base 64
# not pretty, but url safe!
require 'cgi'
CGI::escape(Digest::SHA256.base64digest(str))
#=> "LCa0a2j%2Fxo%2F5m0U8HTBBNBNCLXBkg7%2Bg%2BYpeiGJm564%3D"

--

Edit: and here's a very very very inefficient implementation of base62 ;-)

# gem install base62
require 'base62'

def pack_int(str)
  str.unpack('C*').each_with_index.reduce(0){|r,(x,i)| r + (x << 8*i) }
end

def unpack_int(int)
  n = (Math.log2(int)/8).ceil
  n.times.map{|i| (int >> 8*i) & 255 }.pack('C*')
end

def base62_encode(str)
  Base62.encode(pack_int(str))
end

def base62_decode(encoded)
  unpack_int(Base62.decode(encoded))
end

str = "foo"

# encode
digest = Digest::SHA256.digest(str)
fingerprint = base62_encode(digest)
#=> "fTSIMrZT3fDTvW7XDBq1b7nhWa24Zl55EVpsaO3TBBE"

# decode
recovered_digest = base62_decode(fingerprint)
#=> ",&\xB4kh\xFF\xC6\x8F\xF9\x9BE<\x1D0A4\x13B-pd\x83\xBF\xA0\xF9\x8A^\x88bf\xE7\xAE"

digest == recovered_digest
#=> true
like image 114
Patrick Oscity Avatar answered Oct 20 '22 18:10

Patrick Oscity