What Are Bloom Filters?A Bloom Filter is a data structure that uses multiple hash functions on a key to mark values in a boolean array. The point of a bloom filter is to store whether a key was visited before. An example usecase for a Bloom Filter might be a web crawler, which needs to store whether we've visited a website before. Bloom Filters are NOT about keyvalue pairs like Hash Maps. There's no "value" associated with a key. The only thing we care about for a key is a boolean: true that we've seen this key before, or false that this key is new. Bloom filters are good because they use less memory than a hash map and still provide fast lookup and insertion times. The reason bloom filters aren't incredibly common is because it's possible they give wrong answers: a Bloom Filter might say that you've visited a key before when you really haven't. We'll see this in the upcoming example section. How Bloom Filters WorkA bloom filter can be represented as a list of k hash functions and an array of booleans that starts off with all values initialized to false. A hash function is simply a function that takes some input, and transforms that into some number (which we'll use as the index in our array) as output. The same input will always produce the same output. We care about two operations in a bloom filter:
Insertion works by inputting a key into all k hash functions, and for each outputted array index, marking that array value as true. Querying works by inputting a key into all k hash functions, and for each outputted array index, checking whether that array value is true. If all array values are true, then the key was seen before. If any array values are false, then the key wasn't seen before. The time complexity of inserting and querying are both O(k). Bloom Filter ExampleLet's say we have a bloom filter with an array initialized to false (and pretend our array can hold negative numbers), and let's say k=3 so that we have 3 hash functions (which takes a string s as input, for simplicity). The following operations show the various interactions with a bloom filter (I'll show these in the video):
ConclusionBloom Filters are important data structures and they're frequently asked about in interviews. They're good because they're efficient on space, but the drawback is that they produce false positives, possibly saying it's seen a key before when it really hasn't. We've only looked at a basic bloom filter example to get the concept behind them, because optimizing a real bloom filter takes a lot of tuning to decide a good k and depends on the amount of memory you have. And of course, we saw no code today because the basic concept of a bloom filter is easy enough to code, but coding a real one is a bit involved (because of the complex nature of making good hash functions). Keep this data structure in mind during interviews and bookmark this page if you ever need a refresher. Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake Discord: srcmake#3644 Youtube: srcmake Twitch: www.twitch.tv/srcmake Github: srcmake
To see the youtube video where I draw out how each algorithm works (recommended), please click here.
Problem Statement
Given an array of integers, move all zeroes to the end of the array. Ideally, use no extra space, and keep the nonzero elements stable (meaning the order is preserved).
Attempt #1: Keep Moving Zeroes To The Right  Stable  O(N^2) Time  O(1) Space
A simple algorithm is to go through the array from left to right, and every time you encounter a zero, you search the rest of the array for a nonzero to swap with. However, this approach has a bad Time complexity O(N^2) so it's not good.
Here's some pseudocode showing how you'd implement this:
Stable: Yes
Time: O(N^2) Space: O(1) Attempt #2: Place Elements Into New Array  Stable  O(N) Time  O(N) Space
An easy approach to this question is to create a new array, and place all nonzero elements into it. Then fill the remainder of the new array with zeroes. This approach takes O(N) extra space, so it's not ideal.
Here's some pseudocode showing how you'd implement this:
Stable: Yes
Time: O(N) Space: O(N) Attempt #3 (Best): Special Index Tracking  Stable  O(N) Time  O(1) Space
The ideal solution is a bit tricky. The idea is to go through the array and move all nonzero elements to the left. Where on the left do we write each nonzero element too? We'll keep track of that index basically by counting how many nonzeroes we saw so far. This is the ideal solution because the Time complexity is O(N) and the Space complexity is O(1).
Here's some C++ code for how you'd implement this:
Stable: Yes
Time: O(N) Space: O(1) Edge Cases
The complete code for this project can be found on my algorithm questions and answers repo on github here.
Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake Discord: srcmake#3644 Youtube: srcmake Twitch: www.twitch.tv/srcmake Github: srcmake
To see the youtube video explaining this tutorial (which includes drawings that show how the recursion works), please click here.
The Problem  Repeatedly Finding The Minimum From A Range
Let's say that we have the following array.
What's the minimum value from index 0 to index 6? > 1
What's the minimum value from index 0 to index 5? > 2 What's the minimum value from index 1 to index 1? > 8 What's the minimum value from index 4 to index 5? > 4
It's easy to find the answer to these as a human looking at this small array, but how do we write a program to solve this?
Naive Algorithm  Use A For Loop
The simple thing to go through each number from index L to index R and check if the number at each index is smaller than the current minimum.
Here's some pseudocode that does that.
The running time of this function, used to query the array from range [L,R] is O(N), since we may be asked to query the entire array.
We don't have to do any extra work since we're given the array, so it costs O(1) extra space and O(1) time to build. And updating a value in the array costs O(1) time since we can access it's index directly. Note: It's possible to get the query time to be smaller by memoizing results after we calculate them, but it then requires O(N^2) space/time to build the table to store the values. These stats are pretty good, but it's possible to do even better using segment trees. Segment Trees
If you're asked to make multiple queries on subranges on an array, then segment trees are the ideal data structure to use.
The following sections will show how to build a segment tree structure, how to fill it with the appropriate values, how to query the segment tree for the value you need for a given subrange, and how to update the segment tree if a value of the original range changes. Building An Empty Segment Tree
The first thing we need to do is make an empty tree. The segment tree is represented as an array, we just need to know what the size of the array is.
Here's the pseudocode to calculate the array size and create the array.
This array represents our segment tree. (Each index of the array is a node in our tree.)
Filling In The Segment Tree With Values
Now that we've built our segment tree, we need to fill it in with the values.
To fill in the segment tree, starting at the root, we'll say that the minimum of a node is the lesser value of its two children, and so we recursively search children for their minimums. The nodes that represent just one element (ex. [4,4]), however will say it's minimum is just the single value it represents from the original array. Here's the pseudocode implementation for filling in the segment tree.
The running time of filling the values in the segment tree is O(N), since we visit each element of the original array once, and we touch each leaf of the segment tree once.
Querying The Segment Tree
After we've built a tree, we want to be able to search the tree for its values. (Which, in this case, is the minimum value between indices L and R.)
Similar to the technique we used before, we recursively search the tree, starting from the root, and if a node is within the range [L, R], then we check it's value as a valid minimum of the range.
The running time of this operation is O(log N). At worst, we'll have to go down the height of the tree twice, once for the left subtree, and one for the right subtree.
Updating A Value In The Segment Tree
If one of the elements of the original array needs to be updated with a new value, then we simply go through that element's direct path in the segment tree and update the minimum accordingly.
The running time is O(log N), since we only need to travel the height of the segment tree once. (The leaf node representing the updated index, up to the root of the tree.)
Adding/Deleting A Value In The Array
Basically, if you have to add a new value to the original array, then you have to construct a brand new segment tree. This ends up costing O(N) time to build the new tree. (Don't forget to delete the old tree.)
C++ Segment Tree Full Class Implementation
A full Segment Tree C++ implementation can be found on github here: https://github.com/srcmake/segmenttree
Note that the code was meant to be readable for this tutorial, and not necessarily "efficient". (Meaning some operations could be improved, or some code could be moved to functions. (Ex. Calculating the middle index.)) Here's a sample of the class definition (inside a file named segmenttree.hpp):
The code is similar to our pseudocode, with a few private variables included. (The definitions for each function can be found in the actual segment tree class file on github.
An example of using this class is...
Extending Segment Tree for Other Problems
This tutorial showed a Segment Tree designed for Range Minimum Queries (RMQ), specifically for storing integers, but of course segment trees can be extended for more than just finding the minimum.
We can also:
Conclusion
Segment trees are a data structure that comes up in algorithm competitions (and there's usually no time to code one from scratch), so it's better to have the code ready beforehand to copy and paste. So bookmark this blogpost, subscribe on youtube, and star the github repo.
Keep in mind that this tutorial was meant for demonstration purposes, and there are optimizations and edge cases we could've covered. Feel free to experiment with those on your own.
Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake Discord: srcmake#3644 Youtube: srcmake Twitch: www.twitch.tv/srcmake Github: srcmake 
AuthorHi, I'm srcmake. I play video games and develop software. Protip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
License: All code and instructions are provided under the MIT License.
