Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Shorter/obscured encoding for a URL embedded in another URL?

I'm writing myself a script which basically lets me send a URL and two integer dimensions in the querystring of a single get request. I'm using base64 to encode it, but its pretty damn long and I'm concerned the URL may get too big.

Does anyone know an alternative, shorter method of doing this? It needs to be decode-able when received in a get request, so md5/sha1 are not possible.

Thanks for your time.


Edit: Sorry - I should have explained better: Ok, on our site we display screenshots of websites that get posted up for review. We have our own thumbnail/screenshot server. I'm basically going to be having the image tag contain an encoded string that stores the URL to take a screenshot of, and the width/height of the image to show. I dont however want it in 'raw-text' for the world to see. Obviously base64 can be decided by anyone, but we dont want your average joe picking up the URL path. Really I need to fetch: url, width, height in a single GET request.

like image 762
Sk446 Avatar asked Sep 27 '10 20:09

Sk446


3 Answers

Since you are only using base64 to obfuscate the string, you could just obfuscate it with something else, like rot13 (or your own simple letter substitution function). So, urlencode(str_rot13($str)) to encode and str_rot13(urldecode($str)) to decode.

Or, to just have a shorter base64-encoded string, you could compress the string before base64 encoding it: base64_encode(gzencode($str, 9)) and gzdecode(base64_decode($str)) to decode.

Or, if this is primarily a security issue (you don't mind people seeing the URL, you just want to keep people from hacking it) you could pass these parameters with normal querystring variables, but with a hash appended to prevent tampering. i.e.:

function getHash($url, $width, $height) {
  $secret = 'abcdefghijklmnopqrstuvwxyz whatever you want etc.';
  return sha1($url . $width . $height . $secret);
}

// So use this hash to to construct your URL querystring:
$hash = getHash($url, $width, $height);
$urlQuerystring = '?url='.urlencode($url).'&width='.(int) $width.
                  '&height='.(int) $height.'&hash='.$hash;

// Then in your code that processes the URL, check the hash first
if ($hash != getHash($url, $width, $height))
  // URL is invalid

(Off topic: People are saying you should use POST instead of GET. If all these URLs are doing is fetching screenshots from your database to display (i.e. a search lookup), then GET is fine and correct. But if calling these URLs is actually performing an action like going to another site, making and storing the screenshot, then that's a POST. As their names suggest, GET is for retrieval; POST is for submitting data. If you were to use GET on an expensive operation like making the screenshot, you could end up DOSing your own site when Google etc. index these URLs.)

like image 149
joelhardi Avatar answered Nov 04 '22 14:11

joelhardi


URLs are not meant to be sending long strings of data, encoded or not encoded. After a certain point, when you're dealing with such large amounts of data being sent through the URL you should just start using POST or some form of local storage. FYI, IE has a URL limit of 2038 characters.


EDIT: I don't understand one thing. Why aren't you caching the screen shots? It seems awfully resource intensive to have to take a new screenshot every time somebody views a page with an IMG link to that url.

Maybe your audience is small, and resources are not an issue. However, if it is the opposite and in fact it is a public website-that will not scale very well. I know I'm going beyond what your original question asked, but this will solve your question and more.

As soon as the website is posted up, store the url in some sort of local storage, preferably in sql. I am going to continue this example as if you choose SQL, but of course your implementation is your choice. I would have a primary key, url field, and last_updated timestamp, and optionally an image thumbnail path.

By utilizing local storage, you can now pull the image off a cached copy stored locally on the server every time the page with the thumbnail is requested. A significant amount of resources is saved, and since chances are that those websites aren't going to be updated very often, you can have a cron job or a script that runs every x amount of time to refresh the screenshots in the entire database. Now, all you have to do is directly link (again this depends on your implementation) to the image and none of this huge url string stuff will happen.

OR, just take the easy way and do it client side with http://www.snap.com/

like image 34
theAlexPoon Avatar answered Nov 04 '22 13:11

theAlexPoon


It sounds like your goals are 1. to visually obscure a URL, and 2. to generally encode the data compactly for use in a URL.

First, we need to obscure the URL. Since URLs use much of the Base64 dictionary, any encoding that produces binary (that then has to be Base64-ed) will likely just increase the size. It's best to keep the dictionary in the URL-safe range with minimal need for escaping when urlencode() is applied. I.e. you want this:

/**
 * Rot35 for URLs. To avoid increasing size during urlencode(), commonly encoded
 * chars are mapped to more rarely used chars (end of the uppercase alpha).
 *
 * @param string $url
 * @return string
 */
function rotUrl($url) {
    return strtr($url,
        'abcdefghijklmnopqrstuvwxyz0-:/?=&%#123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
        '123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0-:/?=&%#');
}

Now, for saving bytes, we can encode the URL schema into one char (say, h for HTTP, H for HTTPS), and convert the dimensions into base 32. Wrapping this up:

function obscure($width, $height, $url) {
    $dimensions = base_convert($width, 10, 32) . "."
                . base_convert($height, 10, 32) . ".";
    preg_match('@^(https?)://(.+)@', $url, $m);
    return $dimensions . (($m[1] === 'http') ? 'h' : 'H') . rotUrl($m[2]);
}

function unobscure($str) { /* exercise for the reader! */ }

$url = 'https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=Base64';
$obs = obscure(550, 300, $url);
// h6.9c.H5E.N9B9G5491.FI7UNU9E45O.G8GVK9KC5W-G5391CYcj-51I38XJ51I38Wk1J5fd

Since we avoided non URL-safe chars, if this is put in a querystring (with urlencode), it doesn't grow much (in this case not at all).

Additionally you might want to sign this string so people who know the encoding still can't specify their own parameters via the URL. For this you'd use HMAC, and Base64URL-encode the hash. You can also just keep a substring of the hash (~6 bits per character) to save space. sign() (below) adds an 8 character MAC (48 bits of the hash at 6 bits/char):

function sign($key, $data) {
    return $data . _hmac($key, $data, 8);
}
function verify($key, $signed) {
    $mac = substr($signed, -8);
    $data = substr($signed, 0, -8);
    return $mac === _hmac($key, $data, 8) ? $data : false;
}
function _hmac($key, $data, $macLength) {
    $mac = substr(base64_encode(hash_hmac('sha256', $data, $key, true)), 0, $macLength);
    return strtr($mac, '+/', '-_'); // for URL
}

$key = "Hello World!";
$signed = sign($key, $obs); // appends MAC: "w-jjw2Wm"

$obs = verify($key, $signed); // strips MAC and returns valid data, or FALSE

Update: a better RotURL function.

like image 1
Steve Clay Avatar answered Nov 04 '22 13:11

Steve Clay