Design a Shopping Cart System -- Common Follow-ups

Common Follow-ups

Q: “How would you handle a flash sale with 100x normal traffic?”

Answer: “For flash sales, I’d implement several strategies:

Pre-warm cache

Load popular items into Redis before sale starts
Reduce cache misses during traffic spike

Aggressive rate limiting

Normal: 1000 req/min per user
Flash sale: 100 req/min per user
Protects backend from overload

Queue system

Add items to queue for async processing
Process requests in order instead of real-time
User sees queue position and estimated wait time

Auto-scaling

Scale API servers from 10 to 50 based on CPU metrics
Use Kubernetes HPA or AWS Auto Scaling

Read-only mode

Temporarily disable non-critical writes
Focus capacity on cart operations

CDN for static content

Serve product images and details from CDN
Reduces load on application servers

Most importantly, communicate with users - show queue position, estimated wait time, set expectations.”

Q: “What if two users try to add the last item in stock simultaneously?”

Answer: “This is a classic race condition. Solutions:

Option 1: Reserve inventory first (before adding to cart)

Try to reserve inventory via inventory service
If reservation fails, return out of stock error
If succeeds, add to cart with reservation
Release reservation after 10 minutes if not checked out
Prevents overselling at cart level

Option 2: Optimistic locking in inventory service

UPDATE inventory SET quantity = quantity - 1 WHERE item_id = ? AND quantity >= 1
Only one update succeeds due to transaction isolation
Second request gets 0 rows affected
Return out of stock to second user

Option 3: Accept over-selling, handle at checkout

Let both users add to cart
Don’t reserve inventory at cart level
At checkout, first to pay wins
Second gets out of stock error with voucher compensation

For Shopee, I’d use option 3 - better user experience, handle edge case at checkout with compensation for disappointed users.”

Q: “How do you prevent bots from adding items to cart and not buying?”

Answer: “Bot protection strategy:

Rate limiting per IP

Block IPs making >1000 requests/hour
Use progressive rate limiting (slow down, then block)

CAPTCHA

Show CAPTCHA after suspicious behavior detected
Don’t show on every request (bad UX)

Behavioral analysis

Bots add items in milliseconds
Real users browse for seconds
Track time between actions
Flag accounts with bot-like patterns

Temporary cart holds

Release cart items after 30 minutes of inactivity
Run cleanup job to delete old cart items
Prevents inventory hoarding

Device fingerprinting

Track device fingerprint, not just IP
Bots often use same device signature
Block by fingerprint, not just IP

Require login for high-demand items

For limited releases, require authentication
Harder for bots to create accounts at scale”

Q: “Your cache is showing 60% hit rate instead of 95%. How do you debug?”

Answer: “Systematic debugging approach:

Check cache TTL

Run: redis-cli TTL cart:user_123
If returning -1 (no expiry) or very short TTL, that’s the issue
Fix: Adjust TTL to appropriate value

Check cache eviction

Run: redis-cli INFO stats | grep evicted_keys
High evictions means not enough memory
Fix: Add more Redis nodes or increase memory

Check access patterns

Query: SELECT user_id, COUNT(*) FROM access_logs GROUP BY user_id
Are we caching the right users? (80/20 rule)
Fix: Cache only active users, not all users

Check invalidation logic

Are we invalidating too aggressively?
Every write = cache delete = next read is miss
Fix: Reduce unnecessary invalidations

Check cache warming

For popular users, pre-load cache
Morning traffic spike = cache cold start
Fix: Warm cache before peak hours

Monitor cache key distribution

Run: redis-cli —bigkeys
Are some keys huge, causing memory issues?
Fix: Split large keys into smaller ones

Solution depends on root cause:

Low memory: Add more Redis nodes
Poor eviction policy: Switch to LRU
Wrong caching strategy: Rethink what to cache”

Q: “How would you migrate 10M carts from old schema to new schema with zero downtime?”

Answer: “Zero-downtime migration strategy:

Phase 1: Dual writes (Week 1-2)

Write to both old and new schema simultaneously
Old schema remains source of truth
New schema receives copies for testing
Monitor for errors in new schema writes

Phase 2: Backfill (Week 2-3)

Background job migrates old data in batches
Process 1000 carts at a time to avoid overload
Transform old schema format to new schema
Mark migrated records in old schema
Continue until all old data migrated

Phase 3: Dual reads (Week 3-4)

Try reading from new schema first
If data found, return it
If not found, fallback to old schema
Verify data consistency between schemas
Monitor error rates closely

Phase 4: Switch (Week 4)

All reads and writes go to new schema only
Old schema kept as backup
Monitor for 1 week for any issues
Have rollback plan ready

Phase 5: Cleanup (Week 5)

After verifying everything works
Drop old schema tables
Remove dual-write code
Update documentation

Key principles:

Always maintain backward compatibility
Feature flags to enable instant rollback
Monitor error rates at each phase
Have rollback plan ready at every step
Test migration on staging environment first”

Nature's Digital Garden

Explorer