30. Bloom Filters (Introduction to Probabilistic Structures)

While most data structures guarantee correctness, Bloom Filters trade guaranteed accuracy for incredible space efficiency, using probability.

Concept

A Bloom Filter is a space-efficient, probabilistic data structure used to test whether an element is definitely not a member of a set, or is potentially a member of a set.

It consists of a large bit array and several independent hash functions.

How it Works

To add an element, run it through all k hash functions. Set the bits at the resulting k indices in the array to 1.
To check if an element exists, run it through the same k hash functions. If all k bits are 1, it might exist. If any bit is 0, it definitely does not exist.

Error and Use Case

The Downside (False Positives): Bloom filters can generate False Positives (saying an element is present when it's not), but never False Negatives.
Use Cases: Preventing web browsers from visiting malicious URLs (by checking a blacklist) or checking database queries to avoid costly disk reads if an element is clearly missing.