Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shortest possible encoded string with decode possibility (shorten url) using only PHP

Tags:

I'm looking for a method that encodes a string to the shortest possible length and lets it be decodable (pure PHP, no SQL). I have working script, but I'm unsatisfied with the length of the encoded string.

Scenario

Link to an image (it depends on the file resolution I want to show to the user):

  • www.mysite.com/share/index.php?img=/dir/dir/hi-res-img.jpg&w=700&h=500

Encoded link (so the user can't guess how to get the larger image):

  • www.mysite.com/share/encodedQUERYstring

So, basically I'd like to encode only the search query part of the URL:

  • img=/dir/dir/hi-res-img.jpg&w=700&h=500

The method I use right now will encode the above query string to:

  • y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA

The method I use is:

 $raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';

 $encoded_query_string = base64_encode(gzdeflate($raw_query_string));
 $decoded_query_string = gzinflate(base64_decode($encoded_query_string));

How do I shorten the encoded result and still have the possibility to decode it using only PHP?

like image 435
Artur Filipiak Avatar asked Jan 13 '15 20:01

Artur Filipiak


3 Answers

I suspect that you will need to think more about your method of hashing if you don't want it to be decodable by the user. The issue with Base64 is that a Base64 string looks like a base64 string. There's a good chance that someone that's savvy enough to be looking at your page source will probably recognise it too.

Part one:

a method that encodes an string to shortest possible length

If you're flexible on your URL vocabulary/characters, this will be a good starting place. Since gzip makes a lot of its gains using back references, there is little point as the string is so short.

Consider your example - you've only saved 2 bytes in the compression, which are lost again in Base64 padding:

Non-gzipped: string(52) "aW1nPS9kaXIvZGlyL2hpLXJlcy1pbWcuanBnJnc9NzAwJmg9NTAw"

Gzipped: string(52) "y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA=="

If you reduce your vocabulary size, this will naturally allow you better compression. Let's say we remove some redundant information.

Take a look at the functions:

function compress($input, $ascii_offset = 38){
    $input = strtoupper($input);
    $output = '';
    //We can try for a 4:3 (8:6) compression (roughly), 24 bits for 4 characters
    foreach(str_split($input, 4) as $chunk) {
        $chunk = str_pad($chunk, 4, '=');

        $int_24 = 0;
        for($i=0; $i<4; $i++){
            //Shift the output to the left 6 bits
            $int_24 <<= 6;

            //Add the next 6 bits
            //Discard the leading ASCII chars, i.e make
            $int_24 |= (ord($chunk[$i]) - $ascii_offset) & 0b111111;
        }

        //Here we take the 4 sets of 6 apart in 3 sets of 8
        for($i=0; $i<3; $i++) {
            $output = pack('C', $int_24) . $output;
            $int_24 >>= 8;
        }
    }

    return $output;
}

And

function decompress($input, $ascii_offset = 38) {

    $output = '';
    foreach(str_split($input, 3) as $chunk) {

        //Reassemble the 24 bit ints from 3 bytes
        $int_24 = 0;
        foreach(unpack('C*', $chunk) as $char) {
            $int_24 <<= 8;
            $int_24 |= $char & 0b11111111;
        }

        //Expand the 24 bits to 4 sets of 6, and take their character values
        for($i = 0; $i < 4; $i++) {
            $output = chr($ascii_offset + ($int_24 & 0b111111)) . $output;
            $int_24 >>= 6;
        }
    }

    //Make lowercase again and trim off the padding.
    return strtolower(rtrim($output, '='));
}

It is basically a removal of redundant information, followed by the compression of 4 bytes into 3. This is achieved by effectively having a 6-bit subset of the ASCII table. This window is moved so that the offset starts at useful characters and includes all the characters you're currently using.

With the offset I've used, you can use anything from ASCII 38 to 102. This gives you a resulting string of 30 bytes, that's a 9-byte (24%) compression! Unfortunately, you'll need to make it URL-safe (probably with base64), which brings it back up to 40 bytes.

I think at this point, you're pretty safe to assume that you've reached the "security through obscurity" level required to stop 99.9% of people. Let's continue though, to the second part of your question

so the user can't guess how to get the larger image

It's arguable that this is already solved with the above, but you need to pass this through a secret on the server, preferably with PHP's OpenSSL interface. The following code shows the complete usage flow of functions above and the encryption:

$method = 'AES-256-CBC';
$secret = base64_decode('tvFD4Vl6Pu2CmqdKYOhIkEQ8ZO4XA4D8CLowBpLSCvA=');
$iv = base64_decode('AVoIW0Zs2YY2zFm5fazLfg==');

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
var_dump($input);

$compressed = compress($input);
var_dump($compressed);

$encrypted = openssl_encrypt($compressed, $method, $secret, false, $iv);
var_dump($encrypted);

$decrypted = openssl_decrypt($encrypted, $method, $secret, false, $iv);
var_dump($decrypted);

$decompressed = decompress($compressed);
var_dump($decompressed);

The output of this script is the following:

string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(44) "xozYGselci9i70cTdmpvWkrYvGN9AmA7djc5eOcFoAM="
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

You'll see the whole cycle: compression → encryption → Base64 encode/decode → decryption → decompression. The output of this would be as close as possible as you could really get, at near the shortest length you could get.

Everything aside, I feel obliged to conclude this with the fact that it is theoretical only, and this was a nice challenge to think about. There are definitely better ways to achieve your desired result - I'll be the first to admit that my solution is a little bit absurd!

like image 57
calcinai Avatar answered Sep 20 '22 20:09

calcinai


Instead of encoding the URL, output a thumbnail copy of the original image. Here's what I'm thinking:

  1. Create a "map" for PHP by naming your pictures (the actual file names) using random characters. Random_bytes is a great place to start.

  2. Embed the desired resolution within the randomized URL string from #1.

  3. Use the imagecopyresampled function to copy the original image into the resolution you would like to output before outputting it out to the client's device.

So for example:

  1. Filename example (from bin2hex(random_bytes(6))): a1492fdbdcf2.jpg

  2. Resolution desired: 800x600. My new link could look like: http://myserver.com/?800a1492fdbdcf2600 or maybe http://myserfer.com/?a1492800fdbdc600f2 or maybe even http://myserver.com/?800a1492fdbdcf2=600 depending on where I choose to embed the resolution within the link

  3. PHP would know that the file name is a1492fdbdcf2.jpg, grab it, use the imagecopyresampled to copy to the resolution you want, and output it.

like image 30
JDW Avatar answered Sep 22 '22 20:09

JDW


Theory

In theory we need a short input character set and a large output character set. I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer: 100110100100

But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result: 9A4

Practice

So in your case we have the following character set for the input:

$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";

In total 41 characters: Numbers, lower cases and the special chars = / - . &

The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter

So our output character set is (73 characters):

$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";

Numbers, lower and upper cases and some special characters.

We have more characters in our set for the output than for the input. Theory says we can short our input string. Check!

Coding

Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: Convert an arbitrarily large number from any base to any base

<?php
function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
    if ($fromBaseInput == $toBaseInput) return $numberInput;
    $fromBase = str_split($fromBaseInput, 1);
    $toBase = str_split($toBaseInput, 1);
    $number = str_split($numberInput, 1);
    $fromLen = strlen($fromBaseInput);
    $toLen = strlen($toBaseInput);
    $numberLen = strlen($numberInput);
    $retval = '';
    if ($toBaseInput == '0123456789')
    {
        $retval = 0;
        for ($i = 1;$i <= $numberLen; $i++)
            $retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase), bcpow($fromLen, $numberLen-$i)));
        return $retval;
    }
    if ($fromBaseInput != '0123456789')
        $base10 = convBase($numberInput, $fromBaseInput, '0123456789');
    else
        $base10 = $numberInput;
    if ($base10<strlen($toBaseInput))
        return $toBase[$base10];
    while($base10 != '0')
    {
        $retval = $toBase[bcmod($base10,$toLen)] . $retval;
        $base10 = bcdiv($base10, $toLen, 0);
    }
    return $retval;
}

Now we can shorten the URL. The final code is:

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
$encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
$decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

The encoded string has only 34 characters.

Optimizations

You can optimize the count of characters by

  • reduce the length of input string. Do you really need the overhead of URL parameter syntax? Maybe you can format your string as follows:

$input = '/dir/dir/hi-res-img.jpg,700,500';

This reduces the input itself and the input character set. Your reduced input character set is then:

$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";

Final output:

string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"

string(31) "/dir/dir/hi-res-img.jpg,700,500"

  • reducing the input character set ;-). Maybe you can exclude some more characters? You can encode the numbers to characters first. Then your input character set can be reduced by 10!

  • increase your output character set. So the given set by me is googled within two minutes. Maybe you can use more URL save characters.

Security

Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he/she can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe it’s enough for your use case anyway.

like image 4
Timo Avatar answered Sep 22 '22 20:09

Timo