Background

During recent system design interviews, I often found myself stuck when reasoning about performance bottlenecks, the solution was almost always to introduce a queue — which decouples fast paths (which need to return immediately to user) from slow paths (e.g., writing to DB). However, a discussion with a friend highlighted that “adding Kafka” is an oversimplified heuristic — different queueing systems (Kafka, traditional MQs, Redis-based queues) have distinct tradeoffs, and the right choice depends on the specific use case.


What problem are we solving when “adding a Q”

Decoupling

Problem: Service A needs to talk to Service B, but if B is down or slow, A gets blocked or fails.

With Q:

User Request → Service A → Service B (slow / down) → ❌ Request stucks / fails

Without Q:

User Request → Service A → Queue → ✓ (A returns immediately)
                              ↓
                         Service B (processes when ready)

Benefits: independent operation, loose coupling

Load Leveling / Spike Absorption

Problem: Traffic spikes overwhelm downstream services, causing cascading failures.

Without Q:

1000 requests/sec → Service B (can handle 100 req/sec) → 💥 Crashes

With Q:

1000 requests/sec → Queue (buffers) → Service B processes at steady 100 req/sec

Benefits: Q as a shock absorber; downstream can catch up later & process at its own rate

Async + Retry

Problem: