Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode table construction for base64

Tags:

c

base64

I am reading this libb64 source code for encoding and decoding base64 data.

I know the encoding procedure but i can't figure out how the following decoding table is constructed for fast lookup to perform decoding of encoded base64 characters. This is the table they are using:

static const char decoding[] = {62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-2,-1,-1,-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1,-1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51};

Can some one explain me how the values in this table are used for decoding purpose.

like image 877
cyber_raj Avatar asked Jul 19 '12 10:07

cyber_raj


People also ask

What is the Base64 encoded data?

The basket is full of grapes. The text input is first encoded as binary bit stream of ASCII codes of each character. Each 6 bits of the bit stream are encoded to base64 digit. Image data URI scheme with base64 encoded data output: data:image/jpeg;base64, is the data URI scheme header. /9j/4AAQSkZ... is the encoded base64 data.

How many characters are in base64?

Base64 alphabet contains 64 characters Basic ASCII which are used to encode data. Yes, that's right, just 64 characters is enough to encode any data, of any size. Base64 encoding is case sensitive, so when trying to decode be aware that alphabet variations can alter your entire output.

How to decode a Base64 string with padding?

For the decoding to work with any valid Base64 string, we would need to do something special to the last partition of four characters if they have padding: If the partition only has one = (equal) sign at the end, we simply call decode_quad (a, b, c, 'A') with a, b, and c as the first, second and third characters of the group, respectively.

What is Base64 in mime?

The Base64 term originates from a specific MIME content transfer encoding. The particular choice of characters to make up the 64 characters required for base varies between implementations. The general rule is to choose a set of 64 characters that is both part of a subset common to most encodings, and also printable.


1 Answers

It's a shifted and limited ASCII translating table. The keys of the table are ASCII values, the values are base64 decoded values. The table is shifted such that the index 0 actually maps to the ASCII character + and any further indices map the ASCII values after +. The first entry in the table, the ASCII character +, is mapped to the base64 value 62. Then three characters are ignored (ASCII ,-.) and the next character is mapped to the base64 value 63. That next character is ASCII /.

The rest will become obvious if you look at that table and the ASCII table.

It's usage is something like this:

int decode_base64(char ch) {
    if (ch < `+` or ch > `z`) {
        return SOME_INVALID_CH_ERROR;
    }

    /* shift range into decoding table range */
    ch -= `+`;

    int base64_val = decoding[ch];

    if (base64_val < 0) {
        return SOME_INVALID_CH_ERROR;
    }

    return base64_val;
}
like image 158
orlp Avatar answered Sep 30 '22 13:09

orlp