Most numbering systems start with zero, go through the base-10 digits, and then go to letters once the base-10 digits have been exhausted:
Binary: 0,1 Octal: 0,1,2,3,4,5,6,7 Decimal: 0,1,2,3,4,5,6,7,8,9 Hexidecimal: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
Even the ascii order of characters has numbers come before letters.
The Base64 encoding scheme does things differently:
┌──────┬──────────┬┬──────┬──────────┬┬──────┬──────────┬┬──────┬──────────┐
│Value │ Encoding ││Value │ Encoding ││Value │ Encoding ││Value │ Encoding │
├──────┼──────────┼┼──────┼──────────┼┼──────┼──────────┼┼──────┼──────────┤
│ 0 │ A ││ 17 │ R ││ 34 │ i ││ 51 │ z │
│ 1 │ B ││ 18 │ S ││ 35 │ j ││ 52 │ 0 │
│ 2 │ C ││ 19 │ T ││ 36 │ k ││ 53 │ 1 │
│ 3 │ D ││ 20 │ U ││ 37 │ l ││ 54 │ 2 │
│ 4 │ E ││ 21 │ V ││ 38 │ m ││ 55 │ 3 │
│ 5 │ F ││ 22 │ W ││ 39 │ n ││ 56 │ 4 │
│ 6 │ G ││ 23 │ X ││ 40 │ o ││ 57 │ 5 │
│ 7 │ H ││ 24 │ Y ││ 41 │ p ││ 58 │ 6 │
│ 8 │ I ││ 25 │ Z ││ 42 │ q ││ 59 │ 7 │
│ 9 │ J ││ 26 │ a ││ 43 │ r ││ 60 │ 8 │
│ 10 │ K ││ 27 │ b ││ 44 │ s ││ 61 │ 9 │
│ 11 │ L ││ 28 │ c ││ 45 │ t ││ 62 │ + │
│ 12 │ M ││ 29 │ d ││ 46 │ u ││ 63 │ / │
│ 13 │ N ││ 30 │ e ││ 47 │ v ││ │ │
│ 14 │ O ││ 31 │ f ││ 48 │ w ││(pad) │ = │
│ 15 │ P ││ 32 │ g ││ 49 │ x ││ │ │
│ 16 │ Q ││ 33 │ h ││ 50 │ y ││ │ │
└──────┴──────────┴┴──────┴──────────┴┴──────┴──────────┴┴──────┴──────────┘
Is there a reason why base64 chose to do letters before numbers? Wouldn't it have made more sense for the value 0 to be represented by the encoding 0?
I was recently looking into general base conversions and came across this exact same question. And it is interesting that in 6+ years no one has any comment about it. While I don't have a specific answer, here is some supporting information:
The "Base64" you mention is referred to as "RFC 4648". I found and read the spec on that, and at the very end it mentioned various contributor names and a main author of the RFC: Simon Josefsson. There is a contact e-mail there, so if anyone might know the answer, that is probably a place to start.
There is nothing sacred about RFC 4648, meaning "Base64" doesn't inherently need to obey that recommended standard. Except, of course, that various libraries have implemented it in that way, across many languages, and it has ended up being used widely in encoding e-mails -- and clearly works well at transmitting binary-image data across ancient e-mail systems.
But it seems to me that RFC 4648 is used "just 'cause" of that legacy establishment, not because it is the "best" solution. Every explanation of this "Base64" just starts off explaining the division into groups of 6-bit, etc., etc. without insight into more fundamentally of "why". That is, the articles seem to assume this RFC 4648 is "the" standard for Base64 encoding (as opposed to "a" standard). If we instead used the more straightforward approach of starting with 0-9 instead of A-Z, what breaks or changes in the fundamental goal of conveying binary data across systems? For any general base conversion, you're just indexing into a series of "acceptable printable characters" (and any decoder must be cognizant of the original series used). Anyhow, I agree the shift from starting with alphabetical instead of numeric does just seem "odd" with no apparent rationale.
This doesn't answer the specific question, but I hope it starts more of a discussion about it. We may just need to setup an experiment of "what if we did just change the order of the symbols used" and maybe some actual reason might manifest. One reason might just be that this shift is an intentional obfuscation to make it less obvious that an arbitrary "safe set of symbols" is being used for this purpose of conveying binary data.
EDIT: On the "obfuscation" as an answer, consider... In a given stream of data, one would normally think that "0" or "00" does mean a numeric value of 0 (or a binary-byte sequence of 00000000). In this RFC 4648, instead, "A" means 000000 (or a 6-bit sequence of 0's). So it is "Base 64" in that a set of 64 symbols is involved. But once you define a set of "base symbols" you can convert it to any Base-N (assuming you have enough symbols). So, whatever your sequence is, it now "feels wrong" when your Base-64 doesn't align with this RFC 4648 that is put forth as "the Base-64 standard." But the scope and purpose of RFC 4648 seems to be for more than just a generalized approach to Base-64 (that scope being to involve some intermediate processing systems that maybe don't support 8 or even 7-bit processing-- which for most mainstream developers it may be hard to comprehend such systems still existing). Anyway, it was just jarring to me when explanations of "Base-64" immediately jump into explaining RFC 4648 rather than just explaining the general concept that Base-64 is conceptually the same as any other base (just that it has 64 distinct symbols, regardless of what the symbols might be and their order).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With