Bloom filter usage

Tags:

I am struggling to understand the usefulness of the bloom filter. I get its underlying logic, space compaction, fast lookups, false positives etc. I just cannot put that concept into a real-life situation as being beneficial. One frequent application is use of bloom filters in web caching. We use bloom filter to determine whether a given URL is in the cache or not. Why don't we simply access the cache to determine that? If we get a yes, we still need to go to cache to retrieve the webpage (which might not be there), but in case of a no, we could have got the same answer using the cache (which is probably optimized for fast lookups anyway?).

384

asked Jan 18 '13 16:01

Bober02

1 Answers

Bloom filters are designed for situations where a false negative is a Very Bad Thing and a false positive is acceptable.

For example, suppose that you are making a web browser and have a known blacklist of scam websites. Your blacklist is massive - in the hundreds of gigabytes - so you can't ship it with the browser. However, you can store it on your own servers. In that case, you could ship the browser with a Bloom filter of an appropriate size that holds all the URLs. Before visiting a site, you look it up in the filter. Then, if you get a "no" answer, you're guaranteed that the URL is not blacklisted and can just visit the site. If you get a "yes" answer, the site might be evil, so you can have the browser call up your main server to get the real answer. The fact that you can save a huge number of calls to the server here without ever sacrificing accuracy is important.

The cache idea is similar to this setup. You can query the filter to see if the page is in the cache. If you get a "no" answer, you're guaranteed it's not cached and can do an expensive operation to pull the data from the main source. Otherwise, you can then check the cache to see if it really is there. In rare instances you might need to check the cache, see that it isn't there, then pull from the main source, but you will never accidentally miss something really in cache.

Hope this helps!

answered Sep 30 '22 03:09

templatetypedef

Related questions
                            
                                Why does backtracking make an algorithm non-deterministic?
                            
                                Nth largest element in a binary search tree
                            
                                Best way to find differences between two large arrays in PHP
                            
                                Python - Compress Ascii String
                            
                                Getting the number of trailing 1 bits
                            
                                Algorithm to locate local maxima
                            
                                How to perform binary search on NSArray?
                            
                                How to find if a graph is bipartite?
                            
                                Best articles to start learning about edge detection/image recognition
                            
                                Determining whether or not a directed or undirected graph is a tree
                            
                                Artificial Neural Network Question
                            
                                Data structure for choosing random elements?
                            
                                Why will std::sort crash if the comparison function is not as operator <?
                            
                                How can you test how many instructions per second your computer can do?
                            
                                How to sort a list when certain values must appear later than others, potentially ignoring sort order for such items that need 'delaying' [duplicate]
                            
                                Efficient algorithm for finding spheres farthest apart in large collection
                            
                                Runner technique to combine two equal Linked Lists
                            
                                algorithms: how do divide-and-conquer and time complexity O(nlogn) relate?
                            
                                Algorithm to create fair / evenly matched teams based on player rankings
                            
                                Assigning people to buildings while respecting preferences?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bloom filter usage

Tags:

algorithm

data-structures

bloom-filter

Bober02

People also ask

1 Answers

templatetypedef

Recent Activity

Donate For Us