What Is a Load Balancer? Distributing Traffic Like a Pro
A load balancer is a device or service that distributes incoming network traffic across multiple backend servers to ensure no single server becomes overwhelmed. If one server goes down, the load balancer redirects traffic to the remaining healthy servers, providing fault tolerance and high availability. Load balancers sit between clients and your server pool, acting as an invisible traffic cop that routes each request to the best available backend. Every high-traffic website, from Google to Netflix, runs behind load balancers.
Why Load Balancers Matter
A single server has limits: CPU, memory, network bandwidth, connection capacity. When you outgrow one server, you have two choices:
Vertical scaling: Bigger server (more CPU, more RAM). Works up to a point, then you hit hardware limits and cost becomes prohibitive.
Horizontal scaling: More servers behind a load balancer. Add servers as demand grows, remove them when it drops. No single point of failure. This is how the modern internet scales.
Load balancers enable horizontal scaling by distributing requests across your server pool.
Load Balancing Algorithms
Round Robin
Requests are distributed sequentially to each server in rotation. Server 1, Server 2, Server 3, Server 1, Server 2, Server 3… Simple and works well when servers are identical and requests are similar in complexity.
Least Connections
New requests go to the server with the fewest active connections. Better than round robin when request processing time varies significantly (some requests are quick, others take seconds).
Weighted Round Robin / Weighted Least Connections
Same as above but servers have different weights based on capacity. A server with 32 cores gets twice the weight of a server with 16 cores.
IP Hash
The client’s IP determines which server receives the request (hash of IP modulo server count). Ensures the same client always hits the same server — useful for session persistence without shared session storage.
Least Response Time
Routes to the server with the fastest average response time and fewest active connections. Requires real-time health monitoring.
Layer 4 vs Layer 7
Layer 4 (Transport) load balancers route based on TCP/UDP connection data (source IP, destination IP, ports). They don’t inspect the actual content. Very fast, very efficient, but can’t make content-aware decisions.
Layer 7 (Application) load balancers inspect HTTP headers, URL paths, cookies, and even request bodies. They can route /api/* requests to API servers and /images/* to CDN origins. They can also do SSL termination, compression, and caching.
Most modern load balancers operate at Layer 7. The performance cost is worth the routing intelligence.
Types of Load Balancers
Hardware: Dedicated appliances from F5 (BIG-IP), Citrix (NetScaler). Expensive but powerful. Common in legacy enterprise environments.
Software: Nginx, HAProxy, Envoy, Traefik. Run on commodity servers or containers. The standard for modern infrastructure.
Cloud-native: AWS ALB/NLB, Azure Load Balancer, GCP Load Balancing. Fully managed, auto-scaling, integrated with cloud services. The easiest option if you’re already in the cloud.
DNS-based: Route 53 weighted routing, Cloudflare Load Balancing. Distributes at the DNS level, routing users to different server pools based on geography, health, or weight.
Health Checks
Load balancers continuously monitor backend servers with health checks:
- TCP check: Can I connect to port 80? (Basic liveness)
- HTTP check: Does
/healthreturn 200 OK? (Application readiness) - Custom check: Does the response body contain “healthy”? (Deep health verification)
If a server fails health checks, the load balancer removes it from the pool. When it recovers, it’s automatically added back. This happens transparently to users.
Test It Yourself
Check Server Headers
See if a website uses a load balancer by checking response headers like X-Served-By and Server.