Technique used in Distributed Systems to distribute incoming network traffic or workload across multiple servers or resources.
Load balancers can be put between
- user & server
- internal platform layers/ between microservices
- backend servers & cache
- backend servers & DB
Why Load Balancers?
- more throughput
- less wait times/ response times
- optimize resource utilization and avoid overload
- increased reliability
- smart load balancers predict traffic beforehand to scale
Algorithms
Typically, health checks are performed constantly to add/remove works to the pool
- Least Connection Method
- direct traffic to the worker with fewest active connections
- good for persistent connections
- Least Response Time Method
- direct traffic to the lowest average response time worker
- Least Bandwidth Method
- direct traffic to the worker currently serving least amount of traffic
- Round Robin Method
- go around workers
- Weighted Round Robin Method
- weighted on the performance capability, most capable worker is weighted more
- more of the connections are sent to the higher weights
- IP Hash Method
- Take IP/cookie/user Id or some other identifier on the client side & hash it to determine which worker it goes to
- this lets worker caches to be used effectively, as a client will always connect to the same worker
- use Consistent Hashing
Redundant Load Balancers
Keep a redundant load balancer in addition to the active one as a backup in a cluster → eliminate single point of failure at the load balancer
Stateful Load Balancers
- unlike stateless load balancers, stateful load balancers maintain state with session information
- once a user/session is directed to a worker, subsequent requests are also directed to the same server