Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are hash tables implemented internally in popular languages?

Tags:

c

hashtable

Can someone please shed some light on how popular languages like Python, Ruby implements hash tables internally for symbol lookup? Do they use the classic "array with linked-list" method, or use a balanced tree?

I need a simple (fewer LOC) and fast method for indexing the symbols in a DSL written in C. Was wondering what others have found most efficient and practical.

like image 500
CDR Avatar asked May 24 '09 06:05

CDR


People also ask

How are hash tables implemented in programming?

Hashing is implemented in two steps: An element is converted into an integer by using a hash function. This element can be used as an index to store the original element, which falls into the hash table. The element is stored in the hash table where it can be quickly retrieved using hashed key.

How does a Hashtable work internally?

Hashtable is a kind of Hash map but is synchronized. Hash map is non–synchronized, permits one null key & multiple null values, not-thread safe i.e. cannot share between many threads without proper synchronization, the key/values pairs are stored in Hashtable.

How are hashing algorithms implemented?

Take an array and use the hash function to hash the 26 possible characters with indices of the array. Then iterate over S and increase the value of the current character of the string with the corresponding index for each character. The complexity of this hashing approach is O(N), where N is the size of the string.

Does HashMap use Hashtable internally?

Does HashMap internally use Hashtable? The HashMap uses a hash table (note this is not the Hashtable class that is there in java. util package). Basically, the hash table is a name for a general way of writing a certain kind of data structure.


2 Answers

The classic "array of hash buckets" you mention is used in every implementation I've seen.

One of the most educative versions is the hash implementation in the Tcl language, in file tcl/generic/tclHash.c. More than half of the lines in the file are comments explaining everything in detail: allocation, search, different hash table types, strategies, etc. Sidenote: the code implementating the Tcl language is really readable.

like image 98
zvr Avatar answered Sep 18 '22 13:09

zvr


Perl uses an array with linked lists to hold collisions. It has a simple heuristic to automatically double the size of the array as necessary. There's also code to share keys between hashes to save a little memory. You can read about it in the dated but still relevant Perl Illustrated Guts under "HV". If you're truly adventurous you can dig into hv.c.

The hashing algorithm used to be pretty simple but its probably a lot more complicated now with Unicode. Because the algorithm was predictable there was a DoS attack whereby the attacker generated data which would cause hash collisions. For example, a huge list of keys sent to a web site as POST data. The Perl program would likely split it and dump it into a hash which then shoved it all into one bucket. The resulting hash was O(n) rather than O(1). Throw a whole lot of POST requests at a server and you might clog the CPU. As a result Perl now perturbs the hash function with a bit of random data.

You also might want to look at how Parrot implements basic hashes which is significantly less terrifying than the Perl 5 implementation.

As for "most efficient and practical", use someone else's hash library. For god's sake don't write one yourself for production use. There's a hojillion robust and efficient ones out there already.

like image 32
Schwern Avatar answered Sep 22 '22 13:09

Schwern