[Data Structure] Bloom Filter - A Specialized Hash Map

[Data Structure] Bloom Filters - Kind of Like A Hash Map

11/13/2019

To watch the video that goes over this topic (which also visualizes the example section), click here.

What Are Bloom Filters?

A Bloom Filter is a data structure that uses multiple hash functions on a key to mark values in a boolean array. The point of a bloom filter is to store whether a key was visited before. An example use-case for a Bloom Filter might be a web crawler, which needs to store whether we've visited a website before.

A basic bloom filter consists of some hashes and an array of booleans.

Bloom Filters are NOT about key-value pairs like Hash Maps. There's no "value" associated with a key. The only thing we care about for a key is a boolean: true that we've seen this key before, or false that this key is new.

Bloom filters are good because they use less memory than a hash map and still provide fast lookup and insertion times.

The reason bloom filters aren't incredibly common is because it's possible they give wrong answers: a Bloom Filter might say that you've visited a key before when you really haven't. We'll see this in the upcoming example section.

How Bloom Filters Work

A bloom filter can be represented as a list of k hash functions and an array of booleans that starts off with all values initialized to false.

A hash function is simply a function that takes some input, and transforms that into some number (which we'll use as the index in our array) as output. The same input will always produce the same output.

We care about two operations in a bloom filter:

Inserting a key.
Querying if a key was seen before.

Insertion works by inputting a key into all k hash functions, and for each outputted array index, marking that array value as true.

Querying works by inputting a key into all k hash functions, and for each outputted array index, checking whether that array value is true. If all array values are true, then the key was seen before. If any array values are false, then the key wasn't seen before.

The time complexity of inserting and querying are both O(k).

Bloom Filter Example

Let's say we have a bloom filter with an array initialized to false (and pretend our array can hold negative numbers), and let's say k=3 so that we have 3 hash functions (which takes a string s as input, for simplicity).

A bloom filter initialized.

The following operations show the various interactions with a bloom filter (I'll show these in the video):

Insert "data"
Insert "dog"
Query "data" -> True we've seen it
Query "i" -> False we haven't seen it
Query "cat" -> True we've seen it (even though we never inserted it)

Conclusion

Bloom Filters are important data structures and they're frequently asked about in interviews. They're good because they're efficient on space, but the drawback is that they produce false positives, possibly saying it's seen a key before when it really hasn't.

We've only looked at a basic bloom filter example to get the concept behind them, because optimizing a real bloom filter takes a lot of tuning to decide a good k and depends on the amount of memory you have. And of course, we saw no code today because the basic concept of a bloom filter is easy enough to code, but coding a real one is a bit involved (because of the complex nature of making good hash functions).

Keep this data structure in mind during interviews and bookmark this page if you ever need a refresher.

Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.

Twitter: @srcmake
Discord: srcmake#3644
Youtube: srcmake
Twitch: www.twitch.tv/srcmake
Github: srcmake

References
1. https://prakhar.me/articles/bloom-filters-for-dummies/

Comments are closed.

Author

Hi, I'm srcmake. I play video games and develop software.

Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.

License: All code and instructions are provided under the MIT License.

Discord

Chat with me.

Youtube

Watch my videos.

Twitter

Get the latest news.

Twitch

See the me code live.

Github

My latest projects.