Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Chrome usage of bloom filter

I was reading an wikipedia article on the usage of Bloom Filters . It was mentioned in the article that Bloom filters are used by Google Chrome to detect whether a URL entered is malicious. Because of the presence of false positive

Google Chrome web browser uses a Bloom filter to identify malicious URLs. Any URL is first checked against a local Bloom filter and only upon a hit ,a full check of the URL is performed

I am guessing full check means that Google stores a harsh table of the list of malicious URL and the URL is hashed to checked if it is present in the table. If this is the case , isent it better to just have the hash table instead of hash table + bloom filter??

Please enlightened me on this , is my version of full check correct ???

like image 520
Computernerd Avatar asked Sep 20 '25 01:09

Computernerd


1 Answers

A Bloom filter is a Probabilistic data structure tells us that the element either definitely is not in the set or may be in the set. Bloom filter takes less space (depends on hash functions configured and error rate) compared to the Hashmap. Hashmap can determine whether an element exists or not, whereas bloomfilter can only deterministically check for non existence of an element.

Let us look at the Google Chrome use case. When a user enters a URL it should validate whether the URL is a safe or not. To validate the URL, chrome can make a call to the google serve (internally google can maintain any data structure to find this out). However, the challenge with this approach is multiple fold.For every URL request served on chrome the URL validation happens through Google Server which adds additional dependency on google servers, network roundtrip time, and requirement to maintain high availability to validate the URLs for all the URLs fired from chrome browsers across the world.

Since this data does not change very often (may be updated in an hour or so), chrome might have considered bundling all the malicious sites data as a bloom filter and google might sync this data periodically with the clients(malicious sites are less compared to the full blown websites. ). When user opens a url in chrome it checks the bloom filter if the URL does not exist it is safe. If it exists then bloom filter is not sure about it so the traffic goes to the google server for validating (this traffic will be way less compared to routing all the traffic).

like image 117
Aditya Avatar answered Sep 23 '25 10:09

Aditya