Scaling to a Billion Requests
When our client came to us with a monolithic application struggling under 10,000 concurrent users, we knew a fundamental architecture change was needed. Here's how we scaled it to handle 1 billion daily requests.
The Starting Point
The original application was a typical monolith:
- Single Node.js application
- PostgreSQL database
- Deployed on a single EC2 instance
- Response times averaging 2-3 seconds under load
The Migration Strategy
We didn't do a big-bang rewrite. Instead, we used the Strangler Fig pattern:
1. Identify service boundaries — We mapped the domain using event storming sessions 2. Extract services incrementally — Starting with the highest-traffic endpoints 3. Build the platform — Kubernetes cluster, service mesh, observability stack 4. Migrate traffic gradually — Using feature flags and traffic splitting
Key Architecture Decisions
Service Mesh (Istio): We chose Istio for inter-service communication. It gave us automatic mTLS, traffic management, and observability without modifying application code.
Event-Driven Architecture: We introduced Apache Kafka for asynchronous communication between services. This decoupled services and provided natural backpressure handling.
Database Per Service: Each microservice got its own database, chosen based on its data patterns:
- User service → PostgreSQL
- Product catalog → MongoDB
- Search → Elasticsearch
- Session management → Redis
- Analytics → ClickHouse
Kubernetes Configuration
Our cluster configuration evolved significantly:
- Horizontal Pod Autoscaler with custom metrics (not just CPU)
- Pod Disruption Budgets for zero-downtime deployments
- Resource quotas per namespace to prevent noisy neighbors
- Topology spread constraints for high availability
Results
After 6 months of incremental migration:
- Response times dropped from 2-3s to <100ms (p99)
- System handles 1B+ requests/day
- 99.99% uptime over 12 months
- Infrastructure costs actually decreased by 20% (better resource utilization)
- Deployment frequency went from monthly to 50+ times per day
Lessons Learned
1. Start with observability — You can't improve what you can't measure 2. Don't over-decompose — Too many services creates its own complexity 3. Invest in developer experience — Local development with microservices is hard; make it easy 4. Automate everything — GitOps, CI/CD, infrastructure as code 5. Plan for failure — Circuit breakers, retries, fallbacks are not optional
Microservices aren't a silver bullet, but when done right, they enable a level of scalability and agility that monoliths simply can't match.