I have a 128-bit number in hexadecimal stored in a string (from md5, security isn't a concern here) that I'd like to convert to a base-36 string. If it were a 64-bit or less number I'd convert it to a 64-bit integer then use an algorithm I found to convert integers to base-36 strings but this number is too large for that so I'm kind of at a loss for how to approach this. Any guidance would be appreciated.
Edit: After Roland Illig pointed out the hassle of saying 0/O and 1/l over the phone and not gaining much data density over hex I think I may end up staying with hex. I'm still curious though if there is a relatively simple way to convert an hex string of arbitrary length to a base-36 string.
To convert a hexadecimal string to a numberUse the ToInt32(String, Int32) method to convert the number expressed in base-16 to an integer. The first argument of the ToInt32(String, Int32) method is the string to convert. The second argument describes what base the number is expressed in; hexadecimal is base 16.
hex() function is one of the built-in functions in Python3, which is used to convert an integer number into it's corresponding hexadecimal form. Syntax : hex(x) Parameters : x - an integer number (int object) Returns : Returns hexadecimal string.
The most common and effective way to convert hex into an integer in Python is to use the type-casting function int() . This function accepts two arguments: one mandatory argument, which is the value to be converted, and a second optional argument, which is the base of the number format with the default as 10 .
To convert Python String to hex, use the inbuilt hex() method. The hex() is a built-in method that converts the integer to a corresponding hexadecimal string. For example, use the int(x, base) function with 16 to convert a string to an integer.
A base-36 encoding requires 6 bits to store each token. Same as base-64 but not using 28 of the available tokens. Solving 36^n >= 2^128 yields n >= log(2^128) / log(36) or 25 tokens to encode the value.
A base-64 encoding also requires 6 bits, all possible token values are used. Solving 64^n >= 2^128 yields n >= log(2^128) / log(64) or 22 tokens to encode the value.
Calculating the base-36 encoding requires dividing by powers of 36. No easy shortcuts, you need a division algorithm that can work with 128-bit values. The base-64 encoding is much easier to compute since it is a power of 2. Just take 6 bits at a time and shift by 6, in total 22 times to consume all 128 bits.
Why do you want to use base-36? Base-64 encoders are standard. If you really have a constraint on the token space (you shouldn't, ASCII rulez) then at least use a base-32 encoding. Or any power of 2, base-16 is hex.
If the only thing that is missing is the support for 128 bit unsigned integers, here is the solution for you:
#include <stdio.h>
#include <inttypes.h>
typedef struct {
uint32_t v3, v2, v1, v0;
} uint128;
static void
uint128_divmod(uint128 *out_div, uint32_t *out_mod, const uint128 *in_num, uint32_t in_den)
{
uint64_t x = 0;
x = (x << 32) + in_num->v3;
out_div->v3 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v2;
out_div->v2 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v1;
out_div->v1 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v0;
out_div->v0 = x / in_den;
x %= in_den;
*out_mod = x;
}
int
main(void)
{
uint128 x = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
uint128 result;
uint32_t mod;
uint128_divmod(&result, &mod, &x, 16);
fprintf(stdout, "%08"PRIx32" %08"PRIx32" %08"PRIx32" %08"PRIx32" rest %08"PRIx32"\n", result.v3, result.v2, result.v1, result.v0, mod);
return 0;
}
Using this function you can repeatedly compute the mod-36 result, which leads you to the number encoded as base-36.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With