Is there any way to determine if a string is base64-encoded twice? For example, is there a regex pattern that I can use with <code>preg_match</code> to do this?

(Practical answer.) Don't use regex. Decode your string using <code>base64_decode()</code>'s optional <code>$strict</code> parameter set to <code>true</code> and see if it matches the format you expect. Or simply try and decode it as many times as it permits. E.g.: <pre class="prettyprint"><code>function base64_decode_multiple(string $data, int $count = 2) { while ($count-- > 0 && ($decoded = base64_decode($data, true)) !== false) { $data = $decoded; } return $data; } </code></pre> (Theoretical answer.) Double-base-64-encoded strings are regular, because there is a finite amount of byte sequences that properly base64-encode a base64-encoded message. You can check if something is base64-encoded once since you can validate each set of four characters. The last four bytes in a base64-encoded message may be a special case because <code>=</code>s are used as padding. Using the regular expression: <pre class="prettyprint"><code><char> := [A-Za-z0-9+/] <end-char> := [A-Za-z0-9+/=] <chunk> := <char>{4} <end-chunk> := <char>{2} <end-char>{2} | <char>{3} <end-char> <base64-encoded> := <chunk>* <end-chunk>? </code></pre> You can also determine if something is base64-encoded twice using regular expressions, but the solution is not trivial or pretty, since it's not enough to check 4 bytes at a time. Example: "QUFBQQ==" base64-decodes to "AAAA" that base64-decodes to three NUL-bytes: <pre class="prettyprint"><code>$ echo -n "QUFBQQ==" | base64 -d | xxd 00000000: 4141 4141 AAAA $ echo -n "AAAA" | base64 -d | xxd 00000000: 0000 00 ... </code></pre> At this point we could enumerate all double-base64-encodings where the base64-encoding is 4 bytes within the base64 alphabet ("AAAA", "AAAB", "AAAC", "AAAD", etc.) and minimize this: <pre class="prettyprint"><code><ugly 4> := QUFBQQ== | QUFBQg== | QUFBQw== | QUFBRA== | ... </code></pre> And we could enumerate the first 4 bytes of all double-base64-encodings where the base64-encoding is 8 bytes or longer (cases that don't involve padding with <code>=</code>) and minimize that: <pre class="prettyprint"><code><chunk 4> := QUFB | QkFB | Q0FB | REFB | ... </code></pre> One partition (the pretty one) of double-base64-encoded strings will not contain <code>=</code>s at the end; their lengths are a multiple of 8: <pre class="prettyprint"><code><pretty double-base64-encoded> := <chunk 4>{2}* </code></pre> Another partition of double-base64-encoded strings will have lengths that are multiples of 4 but not 8 (4, 12, 20, etc.); they can be thought of as pretty ones with an ugly bit at the end: <pre class="prettyprint"><code><ugly double-base64-encoded> := <chunk 4>{2}* <ugly 4> </code></pre> We could then construct a combined regular expression: <pre class="prettyprint"><code><double-base64-encoded> := <pretty double-base64-encoded> | <ugly double-base64-encoded> </code></pre> As I said, you probably don't want to go through all this mess just because double-base64-encoded messages are regular. Just like you don't want to check if an integer is within some finite interval. Also, this is a good example of getting the wrong answer when you should have been asking another question. :-)

Determine if string is base64-encoded twice

1 Answers

(Practical answer.) Don't use regex. Decode your string using base64_decode()'s optional $strict parameter set to true and see if it matches the format you expect. Or simply try and decode it as many times as it permits. E.g.:

function base64_decode_multiple(string $data, int $count = 2) {
    while ($count-- > 0 && ($decoded = base64_decode($data, true)) !== false) {
        $data = $decoded;
    }
    return $data;
}

(Theoretical answer.) Double-base-64-encoded strings are regular, because there is a finite amount of byte sequences that properly base64-encode a base64-encoded message.

You can check if something is base64-encoded once since you can validate each set of four characters. The last four bytes in a base64-encoded message may be a special case because =s are used as padding. Using the regular expression:

<char>           := [A-Za-z0-9+/]
<end-char>       := [A-Za-z0-9+/=]
<chunk>          := <char>{4}
<end-chunk>      := <char>{2} <end-char>{2} | <char>{3} <end-char>
<base64-encoded> := <chunk>* <end-chunk>?

You can also determine if something is base64-encoded twice using regular expressions, but the solution is not trivial or pretty, since it's not enough to check 4 bytes at a time.

Example: "QUFBQQ==" base64-decodes to "AAAA" that base64-decodes to three NUL-bytes:

$ echo -n "QUFBQQ==" | base64 -d | xxd
00000000: 4141 4141                                AAAA

$ echo -n "AAAA" | base64 -d | xxd
00000000: 0000 00                                  ...

At this point we could enumerate all double-base64-encodings where the base64-encoding is 4 bytes within the base64 alphabet ("AAAA", "AAAB", "AAAC", "AAAD", etc.) and minimize this:

<ugly 4> := QUFBQQ== | QUFBQg== | QUFBQw== | QUFBRA== | ...

And we could enumerate the first 4 bytes of all double-base64-encodings where the base64-encoding is 8 bytes or longer (cases that don't involve padding with =) and minimize that:

<chunk 4> := QUFB | QkFB | Q0FB | REFB | ...

One partition (the pretty one) of double-base64-encoded strings will not contain =s at the end; their lengths are a multiple of 8:

<pretty double-base64-encoded> := <chunk 4>{2}*

Another partition of double-base64-encoded strings will have lengths that are multiples of 4 but not 8 (4, 12, 20, etc.); they can be thought of as pretty ones with an ugly bit at the end:

<ugly double-base64-encoded> := <chunk 4>{2}* <ugly 4>

We could then construct a combined regular expression:

<double-base64-encoded> := <pretty double-base64-encoded>
                         | <ugly double-base64-encoded>

As I said, you probably don't want to go through all this mess just because double-base64-encoded messages are regular. Just like you don't want to check if an integer is within some finite interval. Also, this is a good example of getting the wrong answer when you should have been asking another question. :-)

answered Sep 28 '22 06:09

sshine

Related questions
                            
                                Add custom field data to WooCommerce order
                            
                                Return json data with php
                            
                                Remove php 5.6.23-1+deprecated+dontuse+deb.sury.org~trusty+1
                            
                                Can't add user to a group using MailChimp API 3.0
                            
                                phpMyAdmin import file size 2M limit
                            
                                Symfony 2 - Attempted to call function "apcu_fetch" from namespace "Doctrine\Common\Cache"
                            
                                How to get all google reviews using business api
                            
                                May I use properties from a parent class in a trait?
                            
                                Validate email domain in Laravel request
                            
                                How to get all orders of current user in woocommerce
                            
                                How can I get id from url with Request $request? (Laravel 5.3)
                            
                                Docker PHP and FreeTDS -cannot find freetds in know installation directories
                            
                                Display the discounted percentage near sale price in Single product pages for WC 3.0+
                            
                                Expand use block in PhpStorm
                            
                                Multi-language indexes with Laravel Scout and Algolia
                            
                                Symfony 3 / Doctrine - Get changes to associations in entity change set
                            
                                Apply coupon discount via GET method in URL even if cart is empty in WooCommerce
                            
                                Laravel: How to get custom sorted eloquent collection using whereIn method
                            
                                Is there any way we can get video duration before upload?
                            
                                My Symfony routes are throwing a 404?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Determine if string is base64-encoded twice

Tags:

regex

php

base64

Bilal Imdad

People also ask

1 Answers

sshine

Recent Activity

Donate For Us