MongoDB

Designing a Smart Queue Management System

Build an intelligent queue with priority handling, dead letter queues, poison message detection, back-pressure, and consumer scaling.

S

srikanthtelkalapally888@gmail.com

Designing a Smart Queue Management System

A robust message queue needs more than just enqueue/dequeue — it needs priority, failure handling, and back-pressure.

Queue Architectures

Point-to-Point (Queue):
  Producer → Queue → One Consumer
  (Competing consumers for load distribution)

Pub/Sub (Topic):
  Producer → Topic → Many Consumers
  (Fan-out, each consumer gets every message)

Priority Queue

Queues by priority:
  high_priority.fifo   → Payment jobs (p99 < 1s)
  normal.fifo          → Email jobs (p99 < 10s)
  low_priority.fifo    → Report jobs (best effort)

Workers:
  Check high_priority first
  Fall through if empty

Dead Letter Queue (DLQ)

Message processing fails:
  Retry 1: immediate
  Retry 2: 30 seconds later
  Retry 3: 5 minutes later
  Retry 4: 1 hour later
  Retry 5: → Dead Letter Queue

DLQ: Manual inspection + replay
# AWS SQS DLQ config
RedrivePolicy:
  deadLetterTargetArn: arn:aws:sqs:dlq-orders
  maxReceiveCount: 5

Poison Message Detection

Message that always causes consumer to crash:

Detect: Same message_id retried > N times
Action:
  Move to poison queue
  Alert engineering team
  Consumer continues (not stuck)

Back-Pressure

Problem: Producer faster than consumer
         → Queue grows → OOM

Solutions:
  Rate limit producer when queue depth > threshold
  Return HTTP 429 to producer (explicit back-pressure)
  Auto-scale consumers when queue depth > 1000

Exactly-Once Processing

Kafka transactional producer:
  begin_transaction()
  produce(topic, message)
  commit_offset()
  commit_transaction()
  → Atomic: message produced + offset committed

Queue Metrics

Monitor:
  Queue depth → Is processing keeping up?
  Consumer lag → How far behind?
  DLQ depth   → Failure rate
  Throughput  → Messages/second
  Age of oldest message → Processing stall?

Conclusion

Smart queues need priority routing, exponential backoff + DLQ for failures, poison message handling, and auto-scaling consumers based on queue depth.

Share this article