I'm looking for a decent BaseN encoder (with custom charset) in Java, that is not limited by input data size (array of bytes).
Something like this:
https://github.com/mklemm/base-n-codec-java
But for "unlimited" data length without any unnecessary memory/performance penalty and "BigInteger abuse magic". Simply something that works as standard BASE64 encoders, but universally for any base/charset. Any solution, or idea how to achieve that is welcomed.
Maybe, if someone has experiences with apache BaseNCodec:
https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/BaseNCodec.html
It looked promising, however it's an Abstract class, and available implemetations look harder to make, than start from scratch.
"ABCDE" = Base5
, "ABCDE-+*/." = Base10
, ...).
https://dzone.com/articles/base-x-encoding
General answer: No. Special case: Yes, for bases a power of 2.
Why? Because thoughts in the Q are in "strong competition" (actually probably "contradiction").
Can you determine results of multiplication and division operations without carrying out multiplication and division calculations? NO. It's a contradiction. When you get the results, by definition, you've carried out the calculation.
So it's not a question of can you avoid the calcuations, but a question of how to streamline them.
Hope that helps your approach. :)
A base N encoding is quite efficient if N is a power of 2, as then conversion can happen between fixed size groups of digits and a fixed size of bytes.
Base64: 26 - 6 bits per digit, hence 4 digits = 24 bits = 3 bytes.
Otherwise school multiplication must happen over the entire length, resulting in much "BigInteger" calculation.
A bit faster instead of for instance repeatedly multiplying/dividing by the base N, is having an array of powers of N.
For encoding of a byte array to digits you could use N0, N1, N2, N3, ... as byte arrays of lesser or equal lengths, and do repeated subtractions.
As byte
is signed, short
might be more suited. Say if the highest byte of the number is 98 and the lessequal N-power is 12 then circa 7 is that digit.
For decoding of digits to a byte array the same powers might be used.
Have fun.
You mention two very different approaches. The BaseN algorithm used in Github implementation is using the mathematical notation of converting an integer between bases. This is equivalent to saying that 10 is the same as 12 in base-8 arithmetic or 1010 in base-2 arithmetic. The algorithm interprets the byte stream as a large number and converts to the assigned base.
Base64 is a very different approach, and you can see an example in Wikipedia Base64 page. The algorithm basically splits the input stream into an array of 6 bits to each element. 2^6 = 64, thus the name Base64. It has a table with the 64 different characters and displays each element in the array (6-bit) to the corresponding conversion table.
I think that you need to select one of the two approaches, since they are very different and not compatible with each other. As for the implementation details, if opting for the second method, this would easier to implement I think, since you basically split into fixed-size parts the stream and encode it according to your own table.
The first method can get quite complicated, since arbitrary arithmetic operations rely on quite complex constructs. You can have a look at exist software, again @ Wikipedia' s list of arbitrary-precision arithmetic software.
Realistically, I think at some point you will find it hard to get characters for your conversions (as the base goes up or the number of bits goes up), unless you will be using the whole Unicode alphabet :).
Hope I helped a bit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With