Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash function for short strings

Tags:

c

string

math

hash

I want to send function names from a weak embedded system to the host computer for debugging purpose. Since the two are connected by RS232, which is short on bandwidth, I don't want to send the function's name literally. There are some 15 chars long function names, and I sometimes want to send those names at a pretty high rate.

The solution I thought about, was to find a hash function which would hash those function names to a single byte, and send this byte only. The host computer would scan all the functions in the source, compute their hash using the same function, and then would translate the hash to the original string.

The hash function must be

  1. Collision free for short strings.
  2. Simple (since I don't want too much code in my embedded system).
  3. Fit a single byte

Obviously, it does not need to be secure by any means, only collision free. So I don't think using cryptography-related hash function is worth their complexity.

An example code:

int myfunc() {
    sendToHost(hash("myfunc"));
}

The host would then be able to present me with list of times where the myfunc function was executed.

Is there some known hash function which holds the above conditions?

Edit:

  1. I assume I will use much less than 256 function-names.
  2. I can use more than a single byte, two bytes would have me pretty covered.
  3. I prefer to use a hash function instead of using the same function-to-byte map on the client and the server, because (1) I have no map implementation on the client, and I'm not sure I want to put one for debugging purposes. (2) It requires another tool in my build chain to inject the function-name-table into my embedded system code. Hash is better in this regard, even if that means I'll have a collision once in many while.
like image 434
Elazar Leibovich Avatar asked Aug 05 '09 12:08

Elazar Leibovich


People also ask

How short can a hash be?

There are many hashes available but 10-characters is pretty small for the result set. Way back, people used CRC-32, which produces a 33-bit hash (basically 4 characters plus one bit). There is also CRC-64 which produces a 65-bit hash.

What is a 32-bit hash?

The 32-bit long hash value is a hexadecimal number of 8 characters. MD4 is a Message Digest Algorithm developed by Ronald L. Rivest from RSA Data Security, Inc. Currently it's considered insecure, but it's very fast on 32-bit mashines and it's used for calculating EDonkey 2000 hashes in the EDonkey p2p network.

What is a 128-bit hash?

MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function that results in a 128-bit hash value. The 128-bit (16-byte) MD5 hashes (also termed message digests) typically are represented as 32-digit hexadecimal numbers (for example, ec55d3e698d289f2afd663725127bace).


1 Answers

Try minimal perfect hashing:

Minimal perfect hashing guarantees that n keys will map to 0..n-1 with no collisions at all.

C code is included.

like image 164
Martin B Avatar answered Oct 16 '22 11:10

Martin B