1οΈβ£ What is a Bloom Filter?
- A Bloom Filter is a space-efficient probabilistic data structure
- Used to check:
- π βIs this element present in a set?β
- It can say:
- β Definitely NOT present
- β οΈ Possibly present (may be wrong)
2οΈβ£ Why use Bloom Filter? Trade-offs
β Use when:
- You want to reduce DB calls
- Memory is limited - donβt require much space
- Fast lookup is required
- Approximation is okay
- Simple to implement
β Avoid when:
- You need exact results
- Deletion is required (use counting BF instead)
π Common Use Cases
- Databases (avoid unnecessary disk lookups) - e.g username checks in 20M records
- Before hitting DB:
- Check Bloom Filter
- If NOT present β skip DB call π
- If maybe present β query DB
- Check Bloom Filter
- Before hitting DB:
- Caching systems
- Web crawlers (avoid revisiting URLs)
- Spell checkers
- Distributed systems (like Apache Cassandra, Redis)
3οΈβ£ Structure
- A Bloom Filter consists of:
Bit Array (size = m)
[0, 0, 0, 0, 0, 0, 0, ...]
&&
`k` hash functions
4οΈβ£ How it Works
β€ Insert Operation
- For element
"cat":
- Apply
khash functions:
h1(cat) β 2
h2(cat) β 5
h3(cat) β 8
- Set bits at those indices:
Index: 0 1 2 3 4 5 6 7 8
Array: [0 0 1 0 0 1 0 0 1]
β€ Search Operation
- To check
"cat":- Check bits at
h1, h2, h3
- Check bits at
All = 1 β Possibly present β
- To check
"dog":
If ANY bit = 0 β Definitely NOT present β
Time Complexity
- Insert β
O(k) - Search β
O(k) - Where:
k = number of hash functions
5οΈβ£ False Positives
π What is it?
- Bloom Filter may say:
"dog" is presentβ (but actually not)
π Why?
- Multiple elements may map to same bits
6οΈβ£ Key Properties
| Property | Value |
|---|---|
| False Negatives | β Never |
| False Positives | β Possible |
| Deletion Support | β Not supported (basic version) |
| Space Efficiency | β Very high |
| Lookup Time | β‘ O(k) |
7οΈβ£ Variants
- Counting Bloom Filter
- Uses counters instead of bits
- β Supports deletion
- Scalable Bloom Filter
- Grows dynamically
8οΈβ£ Interview One Liner
βBloom Filter is a space-efficient probabilistic data structure used to check membership. It uses multiple hash functions to set bits in a bit array. It guarantees no false negatives but may have false positives. Itβs commonly used in systems like caches and databases to avoid expensive lookups.β Bloom Filter = Fast + Memory Efficient + Probabilistic