Back to course

Lesson 30: Bloom Filters (Introduction to Probabilistic Structures)

Algorithms: From Zero to Hero (A Beginner's Guide)

30. Bloom Filters (Introduction to Probabilistic Structures)

While most data structures guarantee correctness, Bloom Filters trade guaranteed accuracy for incredible space efficiency, using probability.

Concept

A Bloom Filter is a space-efficient, probabilistic data structure used to test whether an element is definitely not a member of a set, or is potentially a member of a set.

It consists of a large bit array and several independent hash functions.

How it Works

  1. To add an element, run it through all k hash functions. Set the bits at the resulting k indices in the array to 1.
  2. To check if an element exists, run it through the same k hash functions. If all k bits are 1, it might exist. If any bit is 0, it definitely does not exist.

Error and Use Case

  • The Downside (False Positives): Bloom filters can generate False Positives (saying an element is present when it's not), but never False Negatives.
  • Use Cases: Preventing web browsers from visiting malicious URLs (by checking a blacklist) or checking database queries to avoid costly disk reads if an element is clearly missing.