Opposite of Bloom filter?

People also ask

What is meant by filtering and Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

What is a Bloom filter false positive?

The probability of a false positive – or false positive rate – of a Bloom filter is a function of the randomness of the values generated by the hash functions and of m, n, and k (n is the number of objects mapped into the Bloom filter).

Can Bloom filter give false negative?

Bloom filters do not store the items themselves and they use less space than the lower theoretical limit required to store the data correctly, and therefore, they exhibit an error rate. They have false positives but they do not have false negatives, and the one-sidedness of this error can be turned to our benefit.

What is the purpose of Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. For example, checking availability of username is set membership problem, where the set is the list of all registered username.

Yes, a lossy hash table or a LRUCache is a data structure with fast O(1) lookup that will only give false negatives -- if you ask if "Have I run test X", it will tell you either "Yes, you definitely have", or "I can't remember".

Forgive the extremely crude pseudocode:

setup_test_table():
    create test_table( some large number of entries )
    clear each entry( test_table, NEVER )
    return test_table

has_test_been_run_before( new_test_details, test_table ):
    index = hash( test_details , test_table.length )
    old_details = test_table[index].detail
    // unconditionally overwrite old details with new details, LRU fashion.
    // perhaps some other collision resolution technique might be better.
    test_table[index].details = new_test_details
    if ( old_details === test_details ) return YES
    else if ( old_details === NEVER ) return NEVER
    else return PERHAPS    

main()
    test_table = setup_test_table();
    loop
        test_details = generate_random_test()
        status = has_test_been_run_before( test_details, test_table )
        case status of
           YES: do nothing;
           NEVER: run test (test_details);
           PERHAPS: if( rand()&1 ) run test (test_details);
    next loop
end.

The exact data structure that accomplishes this task is a Direct-mapped cache, and is commonly used in CPUs.

function set_member(set, item)
    set[hash(item) % set.length] = item

function is_member(set, item)
    return set[hash(item) % set.length] == item

Is it possible to store the tests that you did not run? This should inverse the filter's behavior.

How about an LRUCache?

Use a bit set, as mentioned above. If you know the no. of tests you are going to run beforehand, you will always get correct results (present, not-present) from the data structure.
Do you know what keys you will be hashing? If so, you should run an experiment to see the distribution of the keys in the BloomFilter so you can fine tune it to reproduce false positives, or what have you.
You might want to checkout HyperLogLog as well.

Related questions
                            
                                Why use SQL database? [closed]
                            
                                Performance differences... so dramatic?
                            
                                How can CopyOnWriteArrayList be thread-safe?
                            
                                Hash : How does it work internally?
                            
                                Type-safe generic data structures in plain-old C?
                            
                                Array-Based vs List-Based Stacks and Queues
                            
                                How to check deque length in Python
                            
                                Difference between tuples and frozensets in Python
                            
                                What is the benefit of purely functional data structure?
                            
                                Firebase data structure and url
                            
                                what is the difference between set and unordered_set in C++?
                            
                                Interview question: data structure to set all values in O(1)
                            
                                How to create a 2 way map in java
                            
                                Javascript data structures library [closed]
                            
                                Set of objects in javascript
                            
                                DAG vs. tree using Git?
                            
                                Haskell's algebraic data types
                            
                                Interview: Remove Loop in linked list - Java
                            
                                How to make a python dictionary that returns key for keys missing from the dictionary instead of raising KeyError?
                            
                                How do you validate a binary search tree?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Opposite of Bloom filter?

Tags:

data-structures

bloom-filter

People also ask

Recent Activity

Donate For Us