Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a dataset available to fully test a base64 encode/decoder?

I see that there are many base64 implementations available in the opensource and I found multiple internal implementations in a product that I am maintaining.

I'm trying to factor out duplicates but I am not 100% certain that all these implementations give identical output. Therfore I need to have a dataset that tests all possible combinations of input.

Is that somewhere available ? google search did not really report it.

I saw a similar question on stackoverflow but that one has not been fully answered and it is actually just asking for one phrase (in ascii) that would test all 64 chars. It does not handle padding with = for example. So one test string will certainly not fit the bill for a 100% test.

like image 248
David Nouls Avatar asked Aug 22 '12 09:08

David Nouls


People also ask

How do I check Base64 encoding?

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /] . If the rest length is less than 4, the string is padded with '=' characters. ^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

Is it possible to decode Base64?

Encoding files is not encryption and should never be used to secure sensitive data on disk. Rather it is a useful way of transferring or storing large data in the form of a string. While it may obfuscate that actual data from should surfers, anyone who has access to base64 encoded data can easily decode it.

Is there a limit to Base64 encoding?

The schema is SYSTOOLS. A character expression to be encoded. The maximum length in 2732 characters.

How long does it take to decode Base64?

How fast can you decode base64 data? On a recent Intel processor, it takes roughly 2 cycles per byte (from cache) when using a fast decoder like the one from the Chrome browser.


1 Answers

Perhaps something like Base64Test in Bouncy Castle would do what you want?. The tricky part in base64 is handling the padding correctly. It's certainly important to cover that as you mentioned. Accordingly, RFC 4648 specifies these test vectors:

   BASE64("") = ""
   BASE64("f") = "Zg=="
   BASE64("fo") = "Zm8="
   BASE64("foo") = "Zm9v"
   BASE64("foob") = "Zm9vYg=="
   BASE64("fooba") = "Zm9vYmE="
   BASE64("foobar") = "Zm9vYmFy"

Some of your implementations may produce base64 output that differs only by whether they insert line breaks, and where implementations that break lines insert the break and the line termination used. You would have to do additional testing to determine whether you can safely replace an implementation that's using one style with a different one. In particular, a decoder might make assumptions about line length or termination.

like image 157
Hugh Brackett Avatar answered Nov 13 '22 13:11

Hugh Brackett