30. Bloom Filters (Introduction to Probabilistic Structures)
While most data structures guarantee correctness, Bloom Filters trade guaranteed accuracy for incredible space efficiency, using probability.
Concept
A Bloom Filter is a space-efficient, probabilistic data structure used to test whether an element is definitely not a member of a set, or is potentially a member of a set.
It consists of a large bit array and several independent hash functions.
How it Works
- To add an element, run it through all
khash functions. Set the bits at the resultingkindices in the array to 1. - To check if an element exists, run it through the same
khash functions. If allkbits are 1, it might exist. If any bit is 0, it definitely does not exist.
Error and Use Case
- The Downside (False Positives): Bloom filters can generate False Positives (saying an element is present when it's not), but never False Negatives.
- Use Cases: Preventing web browsers from visiting malicious URLs (by checking a blacklist) or checking database queries to avoid costly disk reads if an element is clearly missing.