Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL HASH index

Does anyone know a situation where a PostgreSQL HASH should be used instead of a B-TREE for it seems to me that these things are a trap. They are take way more time to CREATE or maintain than a B-TREE (at least 10 times more), they also take more space (for one of my table.columns, a B-TREE takes up 240 MB, while a HASH would take 4 GB) and I seem to have understood from my googling, that they do not SELECT faster than B-TREEs; yet the HASH may have recently been optimized or google was wrong.

Anyway, I wanted you guy's opinions and experiences. If these HASHs are evil, people should know.

Thanks
Also: what about MySQL's HASHs?

like image 919
Nicholas Leonard Avatar asked Dec 29 '08 22:12

Nicholas Leonard


People also ask

What is hash index Postgres?

Hash Index. Just like the name suggests, Hash indexes in PostgreSQL use a form of the hash table data structure. Hash table is a common data structure in many programming languages. For example, a Dict in Python, a HashMap in Java or the new Map type in JavaScript.

What is a hash index?

A hash index is an index type that is most commonly used in data management. It is typically created on a column that contains unique values, such as a primary key or email address. The main benefit of using hash indexes is their fast performance.

Is hash index faster than B-Tree?

In this run we have changed hash to btree index in pgbench read-only tests. We can see here that the PostgreSQL hash index performs better than the btree index and the performance difference is in the range of 10% to 22%.

What is B-Tree index in PostgreSQL?

PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages.


2 Answers

Hashes are faster than B-Trees for cases where you have a known key value, especially a known unique value.

Hashes should be used if the column in question is never intended to be scanned comparatively with < or > commands.

Hashes are O(1) complexity, B-Trees are O(log n) complexity ( iirc ) , ergo, for large tables with unique entries, fetching an ITEM="foo", they will be the most efficient way of looking it up.

This is especially practical when these unique fields are used on a join condition.

like image 59
Kent Fredric Avatar answered Sep 21 '22 03:09

Kent Fredric


It's better to use a Hash index for text columns that are searched using = operator only. For example a URL column which needs to be indexed for lookups.

A Hash index is approximately 30% the size of a B-Tree index for something like a URL.

The reduced size allows PostgreSQL to use it's cache memory (Aka, shared_buffers) more efficiently.

like image 41
Clive Paterson Avatar answered Sep 21 '22 03:09

Clive Paterson