MongoDB

Designing a Multi-Level Cache Architecture

Design a hierarchical caching system — L1 in-process cache, L2 distributed cache, L3 CDN — with coherence strategies and eviction policies.

srikanthtelkalapally888@gmail.com

March 15, 2026

Designing a Multi-Level Cache Architecture

A hierarchical cache reduces latency and database load by placing caches at multiple levels.

Cache Hierarchy

Request
  ↓
L1: In-Process Cache (memory, <1ms)
  ↓ miss
L2: Distributed Cache (Redis, 1-5ms)
  ↓ miss
L3: CDN Edge Cache (5-30ms)
  ↓ miss
Origin Database (10-100ms)

L1: In-Process Cache

In-memory cache within each application instance.

# Caffeine (Java) / functools.lru_cache (Python)
@lru_cache(maxsize=1000, ttl=60)  # 1000 entries, 60s TTL
def get_user(user_id):
  return redis.get(f'user:{user_id}') or db.query(user_id)

Pros: Fastest (<1ms), no network hop Cons: Not shared across instances, limited size, consistency risk

L2: Distributed Cache (Redis)

Shared across all application instances.

def get_product(product_id):
  key = f'product:{product_id}'
  cached = redis.get(key)
  if cached:
    return json.loads(cached)

  data = db.query('SELECT * FROM products WHERE id = ?', product_id)
  redis.setex(key, 3600, json.dumps(data))  # 1 hour TTL
  return data

L3: CDN Cache

Geographically distributed, serves static + semi-static responses.

Cache-Control: public, max-age=300, stale-while-revalidate=60
Vary: Accept-Language

Cache Coherence Problem

Update product price in DB:
  1. Update DB: price = $49.99
  2. Must invalidate: CDN, Redis, all L1 caches

Strategies:
  a. TTL-based expiry (simple, eventual consistency)
  b. Event-driven invalidation via Kafka
  c. Cache-aside with write-through

Event-Driven Invalidation

DB write
  ↓
Change Data Capture (Debezium)
  ↓
Kafka: product_updated event
  ↓
Cache Invalidation Service
  → DELETE redis:product:123
  → Purge CDN: POST /purge?url=/products/123
  → L1 expires on next poll (30s max)

Hit Rate Optimization

Target: >95% L1 hit rate for hot data
Principles:
  - Cache only hot data (LRU eviction)
  - Warm up caches on deployment
  - Pre-populate on anticipated spikes

Conclusion

Multi-level caches reduce p99 latency by 10-100x. L1 for hot local data, L2 for shared consistency, CDN for global geographic distribution.

Share this article

← Back to all articles