I see that there are many base64 implementations available in the opensource and I found multiple internal implementations in a product that I am maintaining.
I'm trying to factor out duplicates but I am not 100% certain that all these implementations give identical output. Therfore I need to have a dataset that tests all possible combinations of input.
Is that somewhere available ? google search did not really report it.
I saw a similar question on stackoverflow but that one has not been fully answered and it is actually just asking for one phrase (in ascii) that would test all 64 chars. It does not handle padding with = for example. So one test string will certainly not fit the bill for a 100% test.
In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /] . If the rest length is less than 4, the string is padded with '=' characters. ^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.
Encoding files is not encryption and should never be used to secure sensitive data on disk. Rather it is a useful way of transferring or storing large data in the form of a string. While it may obfuscate that actual data from should surfers, anyone who has access to base64 encoded data can easily decode it.
The schema is SYSTOOLS. A character expression to be encoded. The maximum length in 2732 characters.
How fast can you decode base64 data? On a recent Intel processor, it takes roughly 2 cycles per byte (from cache) when using a fast decoder like the one from the Chrome browser.
Perhaps something like Base64Test in Bouncy Castle would do what you want?. The tricky part in base64 is handling the padding correctly. It's certainly important to cover that as you mentioned. Accordingly, RFC 4648 specifies these test vectors:
   BASE64("") = ""
   BASE64("f") = "Zg=="
   BASE64("fo") = "Zm8="
   BASE64("foo") = "Zm9v"
   BASE64("foob") = "Zm9vYg=="
   BASE64("fooba") = "Zm9vYmE="
   BASE64("foobar") = "Zm9vYmFy"
Some of your implementations may produce base64 output that differs only by whether they insert line breaks, and where implementations that break lines insert the break and the line termination used. You would have to do additional testing to determine whether you can safely replace an implementation that's using one style with a different one. In particular, a decoder might make assumptions about line length or termination.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With