Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Hash Key Array

Tags:

arrays

php

hash

I came across this simple PHP Class on GitHub while searching for Bloom Filters, this was named as a "Bloom Filter" but I think it is more of a "Hash Table" either way I am curious, it is very simple to understand.

It reads in a file of words and creates a Hash Array Key for each word, you can then check if the word exist in the Hash Array.

I am curious though is there any benefit of using this versus just storing the actual word as the array key or value and then checking if that word exist in the array, in theory this would just be adding overhead and doing the same thing, please help me understand what I am missing?

<?php
class Dictionary {
    private $words;
    private $wordsHash;
    public $hashLength;

    public function __construct($filepath, $hashLength) {
        $this->words = file($filepath);
        $this->hashLength = $hashLength;
        foreach($this->words as $word){
            $this->wordsHash[$this->createHash($word)] = true;
        }
        echo 'words: ' . count($this->words) . '   hashes: ' . count($this->wordsHash) . "\n";
    }

    public function createHash($str){
        $hash = substr(md5(trim($str)), 0, $this->hashLength);
        return $hash;
    }

    public function checkDictionary($str){
        $hash = $this->createHash(trim($str));
        if(array_key_exists ($hash , $this->wordsHash)){
            return true;
        }
        return false;
    }

}
?>

dictionary.txt file has 10,000 words in it, I will just show a few for demo

der
die
und
in
den
von
zu
das
mit
sich
des
auf
für
ist

Example usage:

<?php
$dictionary = new Dictionary('dictionary.txt', 30);

if($dictionary->checkDictionary('den')){
    echo 'The Word den Exist in the Hash Table';
}else{
    echo 'The Word den DOES NOT Exist in the Hash Table';
}
?>
like image 474
JasonDavis Avatar asked Dec 12 '22 02:12

JasonDavis


1 Answers

The idea with this seems to be that searching for a key is much faster than searching for a specific value in an array. This is especially true for very large arrays. However, I would recommend a simpler approach to (as you already said) avoid overhead and collisions:

$words = array_flip( file($filename) );

// The actual values are now the keys!
// So checking for a word works like this:
if (isset($words['und'])) {
    // ...

// Travling through the words works like this:
foreach ($words as $word => $i) {
    // ...

(PS: This code will not work as expected since every word will include the line break, so you will need to strip that first. But I hope you get the idea.)

like image 184
Niko Avatar answered Dec 14 '22 16:12

Niko