Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Range of valid character for a base 64 encoding

I am interested in the following:
Is there a list of characters that would never occur as part of a base 64 encoded string?
For example *. I am not sure if this would occur or not. If the original input actually had * as part of it would that be encoded differently?

like image 694
Jim Avatar asked Nov 02 '12 12:11

Jim


People also ask

What are valid base 64 characters?

Base64 only contains A–Z , a–z , 0–9 , + , / and = . So the list of characters not to be used is: all possible characters minus the ones mentioned above. For special purposes .

How many characters are there in base 64?

The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.

Is there a limit to Base64 encoding?

The BASE64ENCODE function returns the Base64 encoded version of the binary values of a character string. The schema is SYSTOOLS. A character expression to be encoded. The maximum length in 2732 characters.

How many characters is a 64 bit string?

I have also read the following question, which mentions, for each 3 Bytes of original data the Base64String will have 4 Characters .


1 Answers

Here is what I could turn up: RFC 4648

It includes this convenient table:

                  Table 1: The Base 64 Alphabet   Value Encoding  Value Encoding  Value Encoding  Value Encoding      0 A            17 R            34 i            51 z      1 B            18 S            35 j            52 0      2 C            19 T            36 k            53 1      3 D            20 U            37 l            54 2      4 E            21 V            38 m            55 3      5 F            22 W            39 n            56 4      6 G            23 X            40 o            57 5      7 H            24 Y            41 p            58 6      8 I            25 Z            42 q            59 7      9 J            26 a            43 r            60 8     10 K            27 b            44 s            61 9     11 L            28 c            45 t            62 +     12 M            29 d            46 u            63 /     13 N            30 e            47 v     14 O            31 f            48 w         (pad) =     15 P            32 g            49 x     16 Q            33 h            50 y 

So a regular expression that matches any character that should never appear in Base 64 encodings would be:

[^A-Za-z0-9+/=] 

However, as kapeps answer points out, this is only the recommendation. Specific implementations might choose a different set of 64 characters. (In fact, even the linked RFC contains an alternative table for URL and filename safe encoding, which replaces character 62 and 63 with - and _ respectively). So I guess it really depends on the implementation that created the encoding.

like image 114
Martin Ender Avatar answered Sep 29 '22 21:09

Martin Ender