[How to Build Tech #01] The Heart of Web: Build a Load Balancer ( with Implementation Code) and How it Actually Works

Deep dive in the implementation...

Nov 15, 2025

In modern web architecture, scalability and reliability aren’t optional—they’re essential. A load balancer is a critical component that makes this possible. It acts as the “traffic cop” for your network, intelligently distributing incoming requests across a group of backend servers.

This prevents any single server from becoming a bottleneck, ensuring your application remains fast, responsive, and highly available, even under heavy traffic.

Real-World Analogy

Imagine a popular bank with only one teller. As more customers arrive, the line grows, wait times skyrocket, and the teller becomes overwhelmed.

Now, imagine the bank opens ten teller windows. A load balancer is the queue management system at the entrance. It looks at all ten tellers and directs the next customer to the teller who is available, ensuring the workload is spread evenly. If one teller goes on break (or their computer crashes), the system simply stops sending customers their way until they’re back online.

What You’ll Build

In this guide, I will build a simple, production-ready (in-concept) HTTP/S load balancer in Python from scratch.

You will learn how to implement the core components:

Backend Servers: Simple HTTP servers that do the “real” work.
Health Checking: A system to automatically detect and remove failed servers from the pool.
Routing Algorithm: A Round-Robin strategy to distribute requests fairly.
Request Forwarding: The core proxy logic to pass client requests to a healthy server.
Concurrency: Using a threaded server to handle multiple client requests at once.

Technology Stack Interconnections

I’ll use standard Python libraries to illustrate the core concepts, showing how they connect:

Python HTTP Ecosystem:
http.server (Client-facing) ←→ Load Balancer Core ←→ requests (Backend-facing)
        ↑                           ↑                           ↑
  Listens on Port 8000       Routing Logic (Round-Robin)       Forwards HTTP request
  Handles client HTTP        Manages server pool               Gets response from 8001, etc.
  Needs to be threaded!      Tracks healthy servers            Handles backend errors

Part 1: The Foundation (Backend Servers)

Theory: Before we can balance any load, we need a “load” to balance. These are your backend servers (also called “upstream” servers). In a real application, these servers would run your main business logic (e.g., a Node.js API, a Python Django app, etc.). For our guide, we’ll create simple Python web servers that identify themselves.

Connection: The load balancer must know the address (IP and port) of each backend server. It will treat them as interchangeable units. A crucial part of this is a /health endpoint, which the load balancer will ping to see if the server is alive.

💻 Code: `backend_server.py`

Create this file. We’ll run it multiple times on different ports (8001, 8002, etc.).

Python

# backend_server.py
import sys
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
import os

# Get the port from the command-line arguments, default to 8001
PORT = int(sys.argv[1]) if len(sys.argv) > 1 else 8001
SERVER_ID = f”Server_on_Port_{PORT}”

class SimpleHandler(BaseHTTPRequestHandler):
    
    def do_GET(self):
        “”“Handle GET requests.”“”
        if self.path == ‘/health’:
            self.send_response(200)
            self.send_header(’Content-type’, ‘application/json’)
            self.end_headers()
            self.wfile.write(json.dumps({”status”: “ok”, “server_id”: SERVER_ID}).encode(’utf-8’))
        elif self.path == ‘/’:
            self.send_response(200)
            self.send_header(’Content-type’, ‘application/json’)
            self.end_headers()
            response_data = {
                “message”: f”Hello from {SERVER_ID}!”,
                “path”: self.path,
                “headers”: dict(self.headers)
            }
            self.wfile.write(json.dumps(response_data).encode(’utf-8’))
        else:
            self.send_error(404, “Not Found”)

    def do_POST(self):
        “”“Handle POST requests.”“”
        content_length = int(self.headers[’Content-Length’])
        post_data = self.rfile.read(content_length)
        
        self.send_response(200)
        self.send_header(’Content-type’, ‘application/json’)
        self.end_headers()
        
        response_data = {
            “message”: f”POST request received by {SERVER_ID}”,
            “your_data”: post_data.decode(’utf-8’),
            “path”: self.path,
            “headers”: dict(self.headers)
        }
        self.wfile.write(json.dumps(response_data).encode(’utf-8’))

def run(server_class=HTTPServer, handler_class=SimpleHandler, port=PORT):
    server_address = (’‘, port)
    httpd = server_class(server_address, handler_class)
    print(f”Starting backend server {SERVER_ID} on http://localhost:{port}...”)
    httpd.serve_forever()

if __name__ == “__main__”:
    run()

Part 2: The Load Balancer Implementation

Now for the main event. We’ll create a single load_balancer.py file. It will handle all the logic described in your outline.

1. The Core Class & Concurrency

Theory: Our load balancer needs to handle many simultaneous client connections. Python’s default HTTPServer is single-threaded—it can only handle one request at a time. If we used it, a single slow request would block all other clients.

Solution: We use ThreadingHTTPServer. It spawns a new thread for each incoming connection, allowing us to handle many requests concurrently.

Thread Safety: This concurrency introduces a problem: what if two threads try to pick a server at the exact same time? They might both get the same server or corrupt the “current server” index. This is a race condition.

Solution: We use a threading.Lock. This lock ensures that only one thread can enter the “critical section” (the code that picks the next server) at a time.

2. Health Checking System

Theory: We can’t just assume our servers are online. We need an active health check system. This system will run in a separate, background thread and periodically poll the /health endpoint of each backend server.

It maintains a list of “healthy” servers. The main routing logic will only ever choose from this list.

3. Round-Robin Algorithm

Theory: This is our distribution strategy. It’s the simplest and most common. We keep a list of healthy servers and an index.

Request 1 goes to Server A (index 0).
Request 2 goes to Server B (index 1).
Request 3 goes to Server C (index 2).
Request 4 goes back to Server A (index 0).

Mathematical Model: current_index = (current_index + 1) % total_healthy_servers

This modulo operator ensures the index wraps around to 0 when it reaches the end of the list.

4. Request Forwarding (Reverse Proxy)

Theory: The load balancer acts as a reverse proxy. It terminates the client’s HTTP connection, starts a new HTTP connection to a chosen backend server, and then shuttles the data between them.

Data Flow:

Client connects to Load Balancer (Port 8000).
Load Balancer’s do_GET (or do_POST) method is triggered.
Load Balancer calls its get_next_healthy_server() function.
It uses the requests library to make an identical request to the chosen backend (e.g., Port 8001).
It reads the response (status code, headers, body) from the backend.
It copies this response and sends it back to the original client.

💻 Code: `load_balancer.py`

This file implements all the concepts above.

# load_balancer.py
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
import requests
import threading
import time
import json
import os

# --- Configuration ---
# List of all backend servers (IP:Port)
BACKEND_SERVERS = [
    “http://localhost:8001”,
    “http://localhost:8002”,
    “http://localhost:8003” # Add more as needed
]

# List of servers currently deemed healthy
# This list is dynamic and managed by the health checker
HEALTHY_SERVERS = []

# Current index for Round-Robin. Global to the handler’s scope
CURRENT_SERVER_INDEX = 0

# Lock for thread-safe operations on shared resources (HEALTHY_SERVERS, CURRENT_SERVER_INDEX)
LOCK = threading.Lock()

# Health check configuration
HEALTH_CHECK_INTERVAL = 5  # seconds
HEALTH_CHECK_TIMEOUT = 2   # seconds

# --- Health Checking Logic ---

def health_check_servers():
    “”“
    Runs in a separate thread to periodically check the health of backend servers.
    Updates the global HEALTHY_SERVERS list.
    “”“
    global HEALTHY_SERVERS
    
    while True:
        print(f”[Health Check] Pinging backend servers... ({len(BACKEND_SERVERS)} total)”)
        current_healthy_servers = []
        
        for server in BACKEND_SERVERS:
            try:
                # Ping the ‘/health’ endpoint
                response = requests.get(f”{server}/health”, timeout=HEALTH_CHECK_TIMEOUT)
                
                # Check for a 200 OK status
                if response.status_code == 200:
                    current_healthy_servers.append(server)
                    print(f”[Health Check] {server} is HEALTHY”)
                else:
                    print(f”[Health Check] {server} is UNHEALTHY (Status: {response.status_code})”)
            except requests.exceptions.RequestException as e:
                # Timeout, connection error, etc.
                print(f”[Health Check] {server} is UNHEALTHY (Error: {e.__class__.__name__})”)
        
        # --- Critical Section: Update the global list ---
        with LOCK:
            HEALTHY_SERVERS = current_healthy_servers
        # --- End Critical Section ---
        
        print(f”[Health Check] Active servers: {HEALTHY_SERVERS}”)
        time.sleep(HEALTH_CHECK_INTERVAL)

def get_next_healthy_server():
    “”“
    Uses Round-Robin to select the next healthy server.
    This function is thread-safe thanks to the lock.
    “”“
    global CURRENT_SERVER_INDEX
    
    # --- Critical Section: Read/Write shared variables ---
    with LOCK:
        if not HEALTHY_SERVERS:
            # No healthy servers available
            return None
        
        # Get the next server in the list
        server = HEALTHY_SERVERS[CURRENT_SERVER_INDEX]
        
        # Update the index for the next request, wrapping around
        CURRENT_SERVER_INDEX = (CURRENT_SERVER_INDEX + 1) % len(HEALTHY_SERVERS)
        
        return server
    # --- End Critical Section ---

# --- HTTP Handler & Request Forwarding ---

class LoadBalancerHandler(BaseHTTPRequestHandler):

    def _forward_request(self, method):
        “”“
        Generic function to forward GET, POST, PUT, DELETE requests.
        “”“
        # Step 1: Get the next healthy backend server
        target_server = get_next_healthy_server()
        
        if target_server is None:
            # --- Error Handling: No Servers Available ---
            self.send_response(503) # 503 Service Unavailable
            self.send_header(’Content-type’, ‘application/json’)
            self.end_headers()
            self.wfile.write(json.dumps({”error”: “No healthy backend servers available”}).encode(’utf-8’))
            print(”[Load Balancer] Error: 503 Service Unavailable. No healthy servers.”)
            return

        print(f”[Load Balancer] Forwarding {method} request for {self.path} to {target_server}”)
        
        # Step 2: Build the full target URL
        target_url = f”{target_server}{self.path}”
        
        # Step 3: Copy headers from the original client request
        headers = dict(self.headers)
        
        # Step 4: Read the request body (for POST, PUT)
        body = None
        if ‘Content-Length’ in headers:
            content_length = int(headers[’Content-Length’])
            body = self.rfile.read(content_length)

        try:
            # Step 5: Send the request to the backend server
            # stream=True allows us to proxy the response chunk-by-chunk
            response = requests.request(
                method,
                target_url,
                headers=headers,
                data=body,
                timeout=5,
                stream=True
            )

            # Step 6: Proxy the backend’s response back to the client
            # Copy the status code
            self.send_response(response.status_code)
            
            # Copy all response headers
            for key, value in response.headers.items():
                # Avoid proxying connection-related headers
                if key.lower() not in (’transfer-encoding’, ‘content-encoding’, ‘connection’):
                    self.send_header(key, value)
            
            self.end_headers()
            
            # Stream the response body
            for chunk in response.iter_content(chunk_size=8192):
                self.wfile.write(chunk)

        except requests.exceptions.RequestException as e:
            # --- Error Handling: Backend Server Failed Mid-Request ---
            self.send_response(502) # 502 Bad Gateway
            self.send_header(’Content-type’, ‘application/json’)
            self.end_headers()
            self.wfile.write(json.dumps({”error”: “Bad Gateway”, “details”: str(e)}).encode(’utf-8’))
            print(f”[Load Balancer] Error: 502 Bad Gateway while contacting {target_server}. Error: {e}”)

    # --- Implement HTTP Methods ---
    
    def do_GET(self):
        self._forward_request(’GET’)

    def do_POST(self):
        self._forward_request(’POST’)

    def do_PUT(self):
        self._forward_request(’PUT’)

    def do_DELETE(self):
        self._forward_request(’DELETE’)

# --- Main Execution ---

def run(server_class=ThreadingHTTPServer, handler_class=LoadBalancerHandler, port=8000):
    # Start the health check thread
    # daemon=True means the thread will exit when the main program exits
    health_thread = threading.Thread(target=health_check_servers, daemon=True)
    health_thread.start()
    
    # Wait a moment for the initial health check to populate
    print(”Waiting for initial health check...”)
    time.sleep(HEALTH_CHECK_INTERVAL) # Give it time for the first run
    print(f”Initial healthy servers: {HEALTHY_SERVERS}”)

    if not HEALTHY_SERVERS:
        print(”WARNING: No healthy servers found on startup. Will keep trying.”)

    # Start the main, client-facing load balancer server
    server_address = (’‘, port)
    httpd = server_class(server_address, handler_class)
    print(f”Starting load balancer on http://localhost:{port}...”)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()
    print(”Stopping load balancer...”)

if __name__ == “__main__”:
    run()

Part 3: Running The System

Now, let’s see it all in action.

Save the files: Save backend_server.py and load_balancer.py.
Open 3 Terminal/PowerShell windows for your backend servers.
- python backend_server.py 8001
- python backend_server.py 8002
- python backend_server.py 8003
Open a 4th Terminal for the load balancer.
- python load_balancer.py
Test the Load Balancer: Open a 5th terminal and use curl (or your browser).

Test 1: Round-Robin

Run this command several times:

Bash

curl http://localhost:8000/

You will see the response change, cycling through your servers:

JSON

{”message”: “Hello from Server_on_Port_8001!”}
{”message”: “Hello from Server_on_Port_8002!”}
{”message”: “Hello from Server_on_Port_8003!”}
{”message”: “Hello from Server_on_Port_8001!”}
...and so on.

Test 2: Failure & Recovery

Go to one of your backend terminals (e.g., Port 8002) and press Ctrl+C to stop it.

Watch the load balancer’s terminal. Within 5 seconds, it will report:

[Health Check] http://localhost:8002 is UNHEALTHY (Error: ConnectionError)

[Health Check] Active servers: [’http://localhost:8001’, ‘http://localhost:8003’]

Now, run curl http://localhost:8000/ again. You will see it only alternates between 8001 and 8003, completely skipping the dead server.

Now, restart the server on port 8002: python backend_server.py 8002.

Within 5 seconds, the health checker will find it, and it will be added back into the rotation automatically!

Test 3: POST Requests

Bash

curl -X POST -d “mydata=123” http://localhost:8000/

Output:

JSON

{
  “message”: “POST request received by Server_on_Port_8001”,
  “your_data”: “mydata=123”,
  ...
}

It works! The load balancer correctly forwarded the method, headers, and body.

Part 4: Advanced Features Roadmap (Theory)

Our simple load balancer is great, but real-world systems like Nginx or AWS ELB have more advanced features. Here’s how they build on our concepts.

Weighted Round-Robin

Theory: Our Round-Robin assumes all servers are equal. What if Server A is a powerful machine and Server B is a small one? We should send Server A more traffic.

Implementation Concept: You’d assign a “weight” to each server (e.g., {’server’: ‘A’, ‘weight’: 3}, {’server’: ‘B’, ‘weight’: 1}). The algorithm would send 3 requests to A for every 1 request it sends to B.

Pseudo-code:

Python

# Server list: [A, A, A, B]
# Or, a sequence generator:
servers_with_weight = [
    (server1, 3), 
    (server2, 1),
    (server3, 2)
]
# A more complex algorithm would then pick from this list
# respecting the weights, often using a “current_weight” tracker.

Least Connections Algorithm

Theory: Round-Robin is “dumb”—it doesn’t care about a server’s current load. What if Server A got a very slow, complex request that will take 10 seconds, while Server B is idle? Round-Robin would still send the next request to Server B, then the next to Server A (which is still busy!).

Solution: The load balancer maintains a counter of active connections for each server. It always sends the next request to the server with the fewest active connections.

Implementation Concept:

Python

# We’d need to increment/decrement a counter for each server
# server_connections = {’A’: 5, ‘B’: 2, ‘C’: 4}

def get_least_connections_server():
    with lock:
        # Find the server with the minimum value in our connection dict
        chosen_server = min(server_connections, key=server_connections.get)
        # Increment its count before returning
        server_connections[chosen_server] += 1
    return chosen_server

# We would also need to decrement the count when a request finishes

Session Persistence (Sticky Sessions)

Theory: Some applications are stateful. They store user data (like a shopping cart) in that server’s memory. If Request 1 (add to cart) goes to Server A, and Request 2 (checkout) goes to Server B, Server B won’t know about the shopping cart!

Solution: “Sticky sessions.” The load balancer ensures that all requests from the same user go to the same server.

Implementation Concept (Cookie-based):

First Request: User requests a page.
Load Balancer: Picks Server A (using Round-Robin).
Load Balancer: Before sending the response back to the client, it adds a cookie: Set-Cookie: session_id=serverA.
Subsequent Requests: The client’s browser automatically includes this cookie.
Load Balancer: Reads the cookie. “Ah, session_id=serverA. I must send this to Server A,” (as long as Server A is healthy).

SSL Termination

Theory: Handling HTTPS (SSL/TLS) encryption and decryption is computationally expensive. Instead of making every backend server do this work, the load balancer can do it once, in a central place.

Flow:

Client → Load Balancer: Encrypted HTTPS request.
Load Balancer: Decrypts the request.
Load Balancer → Backend Server: Plain, unencrypted HTTP request (this is faster, as it’s on a secure internal network).
Backend Server → Load Balancer: Plain HTTP response.
Load Balancer → Client: Encrypts the response and sends it as HTTPS.

Rate Limiting & Circuit Breakers

Rate Limiting: Protects your backend from abuse or “denial of service” (DoS) attacks. The load balancer can be configured to only allow X requests per second from a single IP. Any more are rejected with a 429 Too Many Requests error.
Circuit Breaker: An advanced health check. If a server starts failing a lot (e.g., 50% of requests fail), the load balancer “trips a circuit” and stops sending any traffic to it for a set time (e.g., 30 seconds) to give it time to recover, preventing cascading failures.

Conclusion

You’ve successfully built a functional, concurrent load balancer in Python! You’ve implemented the core components of a modern web architecture: service distribution (Round-Robin), resilience (health checks), and concurrency (threading).

This foundation is the basis for all load balancing, from simple projects to the massive, globally-distributed systems run by Google and Amazon.