How are hash tables implemented internally in popular languages?

Tags:

Can someone please shed some light on how popular languages like Python, Ruby implements hash tables internally for symbol lookup? Do they use the classic "array with linked-list" method, or use a balanced tree?

I need a simple (fewer LOC) and fast method for indexing the symbols in a DSL written in C. Was wondering what others have found most efficient and practical.

500

asked May 24 '09 06:05

CDR

2 Answers

The classic "array of hash buckets" you mention is used in every implementation I've seen.

One of the most educative versions is the hash implementation in the Tcl language, in file tcl/generic/tclHash.c. More than half of the lines in the file are comments explaining everything in detail: allocation, search, different hash table types, strategies, etc. Sidenote: the code implementating the Tcl language is really readable.

answered Sep 18 '22 13:09

zvr

Perl uses an array with linked lists to hold collisions. It has a simple heuristic to automatically double the size of the array as necessary. There's also code to share keys between hashes to save a little memory. You can read about it in the dated but still relevant Perl Illustrated Guts under "HV". If you're truly adventurous you can dig into hv.c.

The hashing algorithm used to be pretty simple but its probably a lot more complicated now with Unicode. Because the algorithm was predictable there was a DoS attack whereby the attacker generated data which would cause hash collisions. For example, a huge list of keys sent to a web site as POST data. The Perl program would likely split it and dump it into a hash which then shoved it all into one bucket. The resulting hash was O(n) rather than O(1). Throw a whole lot of POST requests at a server and you might clog the CPU. As a result Perl now perturbs the hash function with a bit of random data.

You also might want to look at how Parrot implements basic hashes which is significantly less terrifying than the Perl 5 implementation.

As for "most efficient and practical", use someone else's hash library. For god's sake don't write one yourself for production use. There's a hojillion robust and efficient ones out there already.

answered Sep 22 '22 13:09

Schwern

Related questions
                            
                                how is select() alerted to an fd becoming "ready"?
                            
                                UTF-8 in Windows
                            
                                Unit testing patterns for microcontroller C code
                            
                                CMake cross-compiling: C flags from toolchain file ignored
                            
                                Do I have the guarantee that sizeof(type) == sizeof(unsigned type)?
                            
                                Why does adding 0 to the end of float literal change how it rounds (possible GCC bug)?
                            
                                UDP checksum calculation
                            
                                Signal handling in pthreads
                            
                                How `realloc` work actually in the background?
                            
                                Jump Table Switch Case question
                            
                                How do C compilers implement functions that return large structures?
                            
                                Compiler can't find Py_InitModule() .. is it deprecated and if so what should I use?
                            
                                Arrays passed by reference by default?
                            
                                Compile a static binary which code there a function gethostbyname
                            
                                EnumProcesses() vs CreateToolhelp32Snapshot()
                            
                                Why does tm_sec range from 0-60 instead of 0-59 in time.h?
                            
                                what is FAR PASCAL?
                            
                                GCC left shift overflow
                            
                                Why create system call is called creat? [closed]
                            
                                How to compare C pointers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How are hash tables implemented internally in popular languages?

Tags:

c

hashtable

CDR

People also ask

2 Answers

zvr

Schwern

Recent Activity

Donate For Us