Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine if string is base64-encoded twice

Tags:

regex

php

base64

Is there any way to determine if a string is base64-encoded twice?

For example, is there a regex pattern that I can use with preg_match to do this?

like image 611
Bilal Imdad Avatar asked Apr 04 '18 12:04

Bilal Imdad


People also ask

How do you check if a string is Base64 encoded or not?

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /] . If the rest length is less than 4, the string is padded with '=' characters. ^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

Can you Base64 encode twice?

(Theoretical answer.) Double-base-64-encoded strings are regular, because there is a finite amount of byte sequences that properly base64-encode a base64-encoded message. You can check if something is base64-encoded once since you can validate each set of four characters.

What does == mean in Base64?

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while == indicates that the four characters will decode to only a single byte.


1 Answers

(Practical answer.) Don't use regex. Decode your string using base64_decode()'s optional $strict parameter set to true and see if it matches the format you expect. Or simply try and decode it as many times as it permits. E.g.:

function base64_decode_multiple(string $data, int $count = 2) {
    while ($count-- > 0 && ($decoded = base64_decode($data, true)) !== false) {
        $data = $decoded;
    }
    return $data;
}

(Theoretical answer.) Double-base-64-encoded strings are regular, because there is a finite amount of byte sequences that properly base64-encode a base64-encoded message.

You can check if something is base64-encoded once since you can validate each set of four characters. The last four bytes in a base64-encoded message may be a special case because =s are used as padding. Using the regular expression:

<char>           := [A-Za-z0-9+/]
<end-char>       := [A-Za-z0-9+/=]
<chunk>          := <char>{4}
<end-chunk>      := <char>{2} <end-char>{2} | <char>{3} <end-char>
<base64-encoded> := <chunk>* <end-chunk>?

You can also determine if something is base64-encoded twice using regular expressions, but the solution is not trivial or pretty, since it's not enough to check 4 bytes at a time.

Example: "QUFBQQ==" base64-decodes to "AAAA" that base64-decodes to three NUL-bytes:

$ echo -n "QUFBQQ==" | base64 -d | xxd
00000000: 4141 4141                                AAAA

$ echo -n "AAAA" | base64 -d | xxd
00000000: 0000 00                                  ...

At this point we could enumerate all double-base64-encodings where the base64-encoding is 4 bytes within the base64 alphabet ("AAAA", "AAAB", "AAAC", "AAAD", etc.) and minimize this:

<ugly 4> := QUFBQQ== | QUFBQg== | QUFBQw== | QUFBRA== | ...

And we could enumerate the first 4 bytes of all double-base64-encodings where the base64-encoding is 8 bytes or longer (cases that don't involve padding with =) and minimize that:

<chunk 4> := QUFB | QkFB | Q0FB | REFB | ...

One partition (the pretty one) of double-base64-encoded strings will not contain =s at the end; their lengths are a multiple of 8:

<pretty double-base64-encoded> := <chunk 4>{2}*

Another partition of double-base64-encoded strings will have lengths that are multiples of 4 but not 8 (4, 12, 20, etc.); they can be thought of as pretty ones with an ugly bit at the end:

<ugly double-base64-encoded> := <chunk 4>{2}* <ugly 4>

We could then construct a combined regular expression:

<double-base64-encoded> := <pretty double-base64-encoded>
                         | <ugly double-base64-encoded>

As I said, you probably don't want to go through all this mess just because double-base64-encoded messages are regular. Just like you don't want to check if an integer is within some finite interval. Also, this is a good example of getting the wrong answer when you should have been asking another question. :-)

like image 81
sshine Avatar answered Sep 28 '22 06:09

sshine