Constant-time hash for strings?

Tags:

Another question on SO brought up the facilities in some languages to hash strings to give them a fast lookup in a table. Two examples of this are dictionary<> in .NET and the {} storage structure in Python. Other languages certainly support such a mechanism. C++ has its map, LISP has an equivalent, as do most other modern languages.

It was contended in the answers to the question that hash algorithms on strings can be conducted in constant timem with one SO member who has 25 years experience in programming claiming that anything can be hashed in constant time. My personal contention is that this is not true, unless your particular application places a boundary on the string length. This means that some constant K would dictate the maximal length of a string.

I am familiar with the Rabin-Karp algorithm which uses a hashing function for its operation, but this algorithm does not dictate a specific hash function to use, and the one the authors suggested is O(m), where m is the length of the hashed string.

I see some other pages such as this one (http://www.cse.yorku.ca/~oz/hash.html) that display some hash algorithms, but it seems that each of them iterates over the entire length of the string to arrive at its value.

From my comparatively limited reading on the subject, it appears that most associative arrays for string types are actually created using a hashing function that operates with a tree of some sort under the hood. This may be an AVL tree or red/black tree that points to the location of the value element in the key/value pair.

Even with this tree structure, if we are to remain on the order of theta(log(n)), with n being the number of elements in the tree, we need to have a constant-time hash algorithm. Otherwise, we have the additive penalty of iterating over the string. Even though theta(m) would be eclipsed by theta(log(n)) for indexes containing many strings, we cannot ignore it if we are in such a domain that the texts we search against will be very large.

I am aware that suffix trees/arrays and Aho-Corasick can bring the search down to theta(m) for a greater expense in memory, but what I am asking specifically if a constant-time hash method exists for strings of arbitrary lengths as was claimed by the other SO member.

Thanks.

689

asked Dec 07 '09 18:12

San Jacinto

2 Answers

A hash function doesn't have to (and can't) return a unique value for every string.

You could use the first 10 characters to initialize a random number generator and then use that to pull out 100 random characters from the string, and hash that. This would be constant time.

You could also just return the constant value 1. Strictly speaking, this is still a hash function, although not a very useful one.

175

answered Oct 01 '22 14:10

Mark Byers

In general, I believe that any complete string hash must use every character of the string and therefore would need to grow as O(n) for n characters. However I think for practical string hashes you can use approximate hashes that can easily be O(1).

Consider a string hash that always uses Min(n, 20) characters to compute a standard hash. Obviously this grows as O(1) with string size. Will it work reliably? It depends on your domain...

answered Oct 01 '22 13:10

Ron Warholic

Related questions
                            
                                Returning strings with const char * in C [duplicate]
                            
                                Is it possible to evaluate a boolean expression for String comparions?
                            
                                No Member named stoi in namespace std
                            
                                How to escape and unescape quotes in JavaScript?
                            
                                Bash: preserve string with spaces input on command line? [duplicate]
                            
                                Should I pass a string by value or pass a pointer to it?
                            
                                How to convert a string to uppercase without using the toUpperCase method?
                            
                                Initializing a char * with an expression does not work
                            
                                How convert a JSON string to Dictionary in Python?
                            
                                How to check if an Array of strings is sorted in alphabetical order or not using Kotlin?
                            
                                How do I shuffle the middle letters of every word in a string?
                            
                                How do I make a specific letter of a string uppercase in Julia?
                            
                                How to shallow copy char* to std::string?
                            
                                Better way to add spaces between double semicolons
                            
                                Where are the operations for String , Int32, etc are defined?
                            
                                Are StringBuilder strings immutable?
                            
                                Why can't I leverage 4GB of RAM in my computer to process less than 2GB of information in C#?
                            
                                string format with %g in C
                            
                                What happens at compile and runtime when concatenating an empty string in Java?
                            
                                What is an efficient way to parse a String in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Constant-time hash for strings?

Tags:

string

associative-array

string-algorithm

San Jacinto

People also ask

2 Answers

Mark Byers

Ron Warholic

Recent Activity

Donate For Us