What's the name of this algorithm/routine?

Tags:

I am writing a utility class which converts strings from one alphabet to another, this is useful in situations where you have a target alphabet you wish to use, with a restriction on the number of characters available. For example, if you can use lower case letters and numbers, but only 12 characters its possible to compress a timestamp from the alphabet 01234567989 -: into abcdefghijklmnopqrstuvwxyz01234567989 so 2010-10-29 13:14:00 might become 5hhyo9v8mk6avy (19 charaters reduced to 16).

The class is designed to convert back and forth between alphabets, and also calculate the longest source string that can safely be stored in a target alphabet given a particular number of characters.

Was thinking of publishing this through Google code, however I'd obviously like other people to find it and use it - hence the question on what this is called. I've had to use this approach in two separate projects, with Bloomberg and a proprietary system, when you need to generate unique file names of a certain length, but want to keep some plaintext, so GUIDs aren't appropriate.

322

asked Oct 29 '10 15:10

Jon Freedman

2 Answers

Your examples bear some similarity to a Dictionary coder with a fixed target and source dictionaries. Also worthwhile to look at is Fibonacci coding, which has a fixed target dictionary (of variable-length bits), which is variably targeted.

I think it also depends whether it is very important that your target alphabet has fixed width entries - if you allow for a fixed alphabet with variable length codes, your compression ratio will approach your entropy that much more optimally! If the source alphabet distribution is known in advance, a static Huffman tree could easily be generated.

answered Nov 15 '22 11:11

Nate

Here is a simple algorithm:

Consider that you don't have to transmit the alphabet used for encoding. Also, you don't use (and transmit) the probabilities of the input symbols, as in standard compressions, so we just re-encode somehow the data.

In this case we can consider that the input data are in number represented with base equal to the cardinality of the input alphabet. We just have to change its representation to another base, that is a simple task.

EDITED example:

input alpabet: ABC, output alphabet: 0123456789

message ABAC will translate to 0102 in base 3, that is 11 (9 + 2) in base 10.

11 to base 10: 11

We could have a problem decoding it, because we don't know how many 0-es to use at the begining of the decoded result, so we have to use one of the modifications:

1) encode somehow in the stream the size of compressed data.

2) use a dummy 1 at the start of the stream: in this way our example will become:

10102 (base 3) = 81 + 9 + 2 = 92 (base 10).

Now after decoding we just have to ignore the first 1 (this also provides a basic error detection).

The main problem of this approach is that in most cases (GCD == 1) each new encoded character will completely change the output. This will be very inneficient and difficult to implement. We end up with arithmetic coding as the best solution (actually a simplified version of it).

answered Nov 15 '22 11:11

ruslik

Related questions
                            
                                Hash a Set of Integers from a Domain into a Set of Buckets
                            
                                Cant solve php algorithm with variations
                            
                                Merging 3 linked lists into 1 (Java )
                            
                                *First* Longest Increasing Subsequence
                            
                                partial lookup in key-value map where key itself is a key-value map
                            
                                Why is substring searching using 'in' operator, faster than using KMP algorithm?
                            
                                Interleaving array {a1,a2,....,an,b1,b2,...,bn} to {a1,b1,a2,b2,a3,b3} in O(n) time and O(1) space
                            
                                Efficiently insert multiple elements in a list (or another data structure) keeping their order
                            
                                Optimizing a vector image by removing unnecessary points and stacking shapes
                            
                                Structure/algorithm for solving game with overlapping cards
                            
                                Do you have genetic algorithm in production?
                            
                                Teleporting Traveler, Optimal Profit over time Problem
                            
                                How would you go about tackling this exercise?
                            
                                How to determine the best case and worst case of an program(algorithm)?
                            
                                Is there "good" PRNG generating values without hidden state?
                            
                                Required Working Precision for the BBP Algorithm?
                            
                                Algorithm for activation key- Security
                            
                                Trilateration in a 2D plane with signal strengths
                            
                                Finding The Max of sum of elements in matrix in distinct rows and columns
                            
                                What is the concatenation complexity of balanced ropes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the name of this algorithm/routine?

Tags:

string

algorithm

compression

Jon Freedman

People also ask

2 Answers

Nate

ruslik

Recent Activity

Donate For Us