Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the name of this algorithm/routine?

I am writing a utility class which converts strings from one alphabet to another, this is useful in situations where you have a target alphabet you wish to use, with a restriction on the number of characters available. For example, if you can use lower case letters and numbers, but only 12 characters its possible to compress a timestamp from the alphabet 01234567989 -: into abcdefghijklmnopqrstuvwxyz01234567989 so 2010-10-29 13:14:00 might become 5hhyo9v8mk6avy (19 charaters reduced to 16).

The class is designed to convert back and forth between alphabets, and also calculate the longest source string that can safely be stored in a target alphabet given a particular number of characters.

Was thinking of publishing this through Google code, however I'd obviously like other people to find it and use it - hence the question on what this is called. I've had to use this approach in two separate projects, with Bloomberg and a proprietary system, when you need to generate unique file names of a certain length, but want to keep some plaintext, so GUIDs aren't appropriate.

like image 322
Jon Freedman Avatar asked Oct 29 '10 15:10

Jon Freedman


People also ask

What are the 4 types of algorithm?

Introduction To Types of AlgorithmsBrute Force algorithm. Greedy algorithm. Recursive algorithm. Backtracking algorithm.

What are 3 examples of algorithms?

Common examples include: the recipe for baking a cake, the method we use to solve a long division problem, the process of doing laundry, and the functionality of a search engine are all examples of an algorithm.

What is this algorithm?

An algorithm is a procedure used for solving a problem or performing a computation. Algorithms act as an exact list of instructions that conduct specified actions step by step in either hardware- or software-based routines. Algorithms are widely used throughout all areas of IT.


2 Answers

Your examples bear some similarity to a Dictionary coder with a fixed target and source dictionaries. Also worthwhile to look at is Fibonacci coding, which has a fixed target dictionary (of variable-length bits), which is variably targeted.

I think it also depends whether it is very important that your target alphabet has fixed width entries - if you allow for a fixed alphabet with variable length codes, your compression ratio will approach your entropy that much more optimally! If the source alphabet distribution is known in advance, a static Huffman tree could easily be generated.

like image 89
Nate Avatar answered Nov 15 '22 11:11

Nate


Here is a simple algorithm:

Consider that you don't have to transmit the alphabet used for encoding. Also, you don't use (and transmit) the probabilities of the input symbols, as in standard compressions, so we just re-encode somehow the data.

In this case we can consider that the input data are in number represented with base equal to the cardinality of the input alphabet. We just have to change its representation to another base, that is a simple task.

EDITED example:

input alpabet: ABC, output alphabet: 0123456789

message ABAC will translate to 0102 in base 3, that is 11 (9 + 2) in base 10.

11 to base 10: 11

We could have a problem decoding it, because we don't know how many 0-es to use at the begining of the decoded result, so we have to use one of the modifications:

1) encode somehow in the stream the size of compressed data.

2) use a dummy 1 at the start of the stream: in this way our example will become:

10102 (base 3) = 81 + 9 + 2 = 92 (base 10).

Now after decoding we just have to ignore the first 1 (this also provides a basic error detection).

The main problem of this approach is that in most cases (GCD == 1) each new encoded character will completely change the output. This will be very inneficient and difficult to implement. We end up with arithmetic coding as the best solution (actually a simplified version of it).

like image 35
ruslik Avatar answered Nov 15 '22 11:11

ruslik