Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a hash table work? Is it faster than "SELECT * from .."

Let's say, I have :

Key | Indexes | Key-values
----+---------+------------
001 | 100001  | Alex
002 | 100002  | Micheal
003 | 100003  | Daniel

Lets say, we want to search 001, how to do the fast searching process using hash table?

Isn't it the same as we use the "SELECT * from .. " in mysql? I read alot, they say, the "SELECT *" searching from beginning to end, but hash table is not? Why and how?

By using hash table, are we reducing the records we are searching? How?

Can anyone demonstrate how to insert and retrieve hash table process in mysql query code? e.g.,

SELECT * from table1 where hash_value="bla" ...

Another scenario: If the indexes are like S0001, S0002, T0001, T0002, etc. In mysql i could use:

SELECT * from table WHERE value = S*

isn't it the same and faster?

like image 319
roa3 Avatar asked Feb 12 '09 06:02

roa3


People also ask

Is Select * slower than select column?

For your question just use SELECT *. If you need all the columns there's no performance difference.

Do indexes make queries faster?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

Is indexing faster than hashing?

Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.

Is join faster than select?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.


1 Answers

A simple hash table works by keeping the items on several lists, instead of just one. It uses a very fast and repeatable (i.e. non-random) method to choose which list to keep each item on. So when it is time to find the item again, it repeats that method to discover which list to look in, and then does a normal (slow) linear search in that list.

By dividing the items up into 17 lists, the search becomes 17 times faster, which is a good improvement.

Although of course this is only true if the lists are roughly the same length, so it is important to choose a good method of distributing the items between the lists.

In your example table, the first column is the key, the thing we need to find the item. And lets suppose we will maintain 17 lists. To insert something, we perform an operation on the key called hashing. This just turns the key into a number. It doesn't return a random number, because it must always return the same number for the same key. But at the same time, the numbers must be "spread out" widely.

Then we take the resulting number and use modulus to shrink it down to the size of our list:

Hash(key) % 17

This all happens extremely fast. Our lists are in an array, so:

_lists[Hash(key % 17)].Add(record);

And then later, to find the item using that key:

Record found = _lists[Hash(key % 17)].Find(key);

Note that each list can just be any container type, or a linked list class that you write by hand. When we execute a Find in that list, it works the slow way (examine the key of each record).

like image 149
Daniel Earwicker Avatar answered Oct 15 '22 15:10

Daniel Earwicker