Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

substitution cipher with different alphabet length

I would like to implement a simple substitution cipher to mask private ids in URLs.

I know how my IDs will look like (combination of uppercase ASCII letters, digits and underscore), and they will be rather long, as they are composed keys. I would like to use a longer alphabet to shorten the resulting codes (I'd like to use upper- and lowercase ASCII letters, digits and nothing else). So my incoming alphabet would be

[A-Z0-9_] (37 chars)

and my outgoing alphabet would be

[A-Za-z0-9] (62 chars)

so a compression of almost 50% reasonable amount of compression would be available.

Let's say my URLs look like this:

/my/page/GFZHFFFZFZTFZTF_24_F34

and I want them to look like this instead:

/my/page/Ft32zfegZFV5

Obviously both arrays would be shuffled to bring some random order in.

This does not have to be secure. If someone figures it out: fine, but I don't want the scheme to be obvious.

My desired solution would be to convert the string to an integer representation of radix 37, convert the radix to 62 and use the second alphabet to write out that number. is there any sample code available that does something similar? Integer.parseInt() has some similar logic, but it is hard-coded to use standard digit behavior.

Any ideas?

I am using Java to implement this but code or pseudo-code in any other language is of course also helpful.

like image 316
Sean Patrick Floyd Avatar asked May 19 '10 08:05

Sean Patrick Floyd


People also ask

How many replacements can a substitution cipher have for one letter?

The Substitution Cipher In this method, each letter of the message is replaced with a single character. Table 3.1 shows an example of a substitution cipher.

Which cipher type replaces the original text in a message with a different text?

The Caesar Cipher is a simple substitution cipher which replaces each original letter with a different letter in the alphabet by shifting the alphabet by a certain amount.

How many possible variations of substitution ciphers are there with the standard English alphabet?

Although the number of possible substitution alphabets is very large (26!

What is the size of key space in the substitution cipher assuming 26 letters?

Generally, key of 26 english letters defines a key space sized 26 26 . For substitution ciphers over english alphabet 26! is the correct number representing the key space. That's because for substitution cipher the key is defined as a unique replacement of each letter with another one, e.g. A -> D, B -> M, C -> Y, etc.


1 Answers

Inexplicably Character.MAX_RADIX is only 36, but you can always write your own base conversion routine. The following implementation isn't high-performance, but it should be a good starting point:

import java.math.BigInteger;
public class BaseConvert {
    static BigInteger fromString(String s, int base, String symbols) {
        BigInteger num = BigInteger.ZERO;
        BigInteger biBase = BigInteger.valueOf(base);
        for (char ch : s.toCharArray()) {
            num = num.multiply(biBase)
                     .add(BigInteger.valueOf(symbols.indexOf(ch)));
        }
        return num;
    }
    static String toString(BigInteger num, int base, String symbols) {
        StringBuilder sb = new StringBuilder();
        BigInteger biBase = BigInteger.valueOf(base);
        while (!num.equals(BigInteger.ZERO)) {
            sb.append(symbols.charAt(num.mod(biBase).intValue()));
            num = num.divide(biBase);
        }
        return sb.reverse().toString();
    }
    static String span(char from, char to) {
        StringBuilder sb = new StringBuilder();
        for (char ch = from; ch <= to; ch++) {
            sb.append(ch);
        }
        return sb.toString();
    }
}

Then you can have a main() test harness like the following:

public static void main(String[] args) {
    final String SYMBOLS_AZ09_ = span('A','Z') + span('0','9') + "_";
    final String SYMBOLS_09AZ = span('0','9') + span('A','Z');
    final String SYMBOLS_AZaz09 = span('A','Z') + span('a','z') + span('0','9');

    BigInteger n = fromString("GFZHFFFZFZTFZTF_24_F34", 37, SYMBOLS_AZ09_);

    // let's convert back to base 37 first...
    System.out.println(toString(n, 37, SYMBOLS_AZ09_));
    // prints "GFZHFFFZFZTFZTF_24_F34"

    // now let's see what it looks like in base 62...       
    System.out.println(toString(n, 62, SYMBOLS_AZaz09));
    // prints "ctJvrR5kII1vdHKvjA4"

    // now let's test with something we're more familiar with...
    System.out.println(fromString("CAFEBABE", 16, SYMBOLS_09AZ));
    // prints "3405691582"

    n = BigInteger.valueOf(3405691582L);
    System.out.println(toString(n, 16, SYMBOLS_09AZ));
    // prints "CAFEBABE"        
}

Some observations

  • BigInteger is probably easiest if the numbers can exceed long
  • You can shuffle the char in the symbol String, just stick to one "secret" permutation

Note regarding "50% compression"

You can't generally expect the base 62 string to be around half as short as the base 36 string. Here's Long.MAX_VALUE in base 10, 20, and 30:

    System.out.format("%s%n%s%n%s%n",
        Long.toString(Long.MAX_VALUE, 10), // "9223372036854775807"
        Long.toString(Long.MAX_VALUE, 20), // "5cbfjia3fh26ja7"
        Long.toString(Long.MAX_VALUE, 30)  // "hajppbc1fc207"
    );
like image 138
polygenelubricants Avatar answered Sep 29 '22 16:09

polygenelubricants