Architecting a Serverless WebSocket Fan-Out for Millions of Concurrent Users

In the high-stakes ecosystem of global cryptocurrency trading, latency is the enemy, and stale data is a fatal flaw. When Bitcoin surges or crashes, millions of active traders expect their dashboard tickers to update in real-time, simultaneously, without a perceived delay. Delivering a single backend pricing event to millions of connected web and mobile clients is one of the most notoriously difficult challenges in modern distributed systems.

Historically, this meant provisioning massive fleets of virtual machines to hold open TCP connections. Today, we are shifting the paradigm.

In this tutorial, we will architect a strictly serverless, decoupled WebSocket fan-out system capable of handling millions of concurrent users. We will leverage Alibaba Cloud API Gateway (WebSocket Mode) to hold the physical connections, Function Compute (FC) 3.0 for stateless logic execution, and Tair (Redis Pub/Sub) as our high-speed, in-memory message broker.

1. The Stateful Bottleneck: Why Traditional Node.js Servers Crash Under Massive Load

To appreciate the serverless approach, we must first dissect why traditional, stateful architectures fail at this scale.

The C10M Problem and the OS Toll

The traditional approach to real-time communication involves frameworks like Socket.io running on Node.js, hosted on Elastic Compute Service (ECS) instances behind a Server Load Balancer (SLB). In this model, the ECS instance must physically hold the TCP connection open for every connected client.

While the “C10K” (10,000 connections) problem was solved years ago, the “C10M” (10 million connections) problem introduces severe operating system and runtime bottlenecks:

File Descriptor Exhaustion: Every TCP connection is a file descriptor in Linux. Managing millions of these requires deep kernel tuning (fs.file-max, net.ipv4.ip_local_port_range), and eventually, you hit hard hardware limits on network interface cards (NICs).
V8 Garbage Collection Pauses: Node.js is single-threaded. When a framework like Socket.io maintains millions of connection objects in memory, the V8 JavaScript engine struggles with memory allocation. When the garbage collector (GC) runs, it halts the event loop (“stop-the-world”). A 500ms GC pause during a high-frequency crypto trading spike results in massive queue buildups and disconnected clients.
The Nightmare of Auto-Scaling Stateful Clusters: When traffic spikes, auto-scaling groups spin up new ECS instances. However, existing load balancers with “sticky sessions” cannot easily rebalance established, long-lived WebSocket TCP connections without dropping them. Scaling down is even worse; you must drain connections slowly, forcing clients to reconnect, which causes a “thundering herd” problem that can take down the newly scaled cluster.

The Solution: We must completely decouple the persistent connection layer from the business logic layer.

2. Architecture Flow: The Stateless Decoupling Strategy

Alibaba Cloud offers a native mechanism to break this stateful bottleneck by using API Gateway as a managed connection pool. Here is how the decoupled architecture flows:

The Edge Connection: The client initiates a WebSocket connection (wss://) to the Alibaba Cloud API Gateway.
Connection Persistence: API Gateway accepts the handshake and physically holds the TCP connection. It acts as a massive, managed proxy. It generates a unique connectionId for that specific client.
Stateless Logic (FC 3.0): API Gateway translates WebSocket lifecycle events (Connect, Disconnect, Message) into standard HTTP POST requests and sends them to Function Compute (FC) 3.0.
State Storage (Tair): FC 3.0, acting purely statelessly, takes the connectionId and saves it to a Tair (Redis) cluster. The FC container then goes to sleep or dies. No connections are held in the compute layer.
The Broadcast (Pub/Sub): The backend crypto pricing engine detects a price change. It publishes a single message to a Tair Pub/Sub channel.
The Fan-Out: A background FC trigger reads the Tair message, pulls the active connectionIds, and uses the API Gateway Reverse Invocation SDK to push the payload back to API Gateway. API Gateway then forwards the message down the open TCP pipes to the clients.

Because FC 3.0 scales to tens of thousands of concurrent instances in milliseconds and Tair provides sub-millisecond data access, the entire pipeline is serverless, auto-scaling, and strictly stateless.

3. Implementation Details

Let’s translate this architecture into reality. We will first configure the API Gateway to handle two-way WebSocket communication, and then write the Function Compute code to manage the state and fan-out.

Step 3.1: Configuring the WebSocket API Gateway (Aliyun CLI)

Alibaba Cloud API Gateway supports a specific Two-Way communication mode required for pushing data from the backend to the client.

First, create an API Group:

Bash

aliyun cloudapi CreateApiGroup \
  --GroupName "CryptoTickerWS" \
  --Description "API Group for Serverless WebSockets" \
  --RegionId cn-hangzhou

Next, define the API. We configure the API to accept WebSocket protocol and map it to our Function Compute endpoint. Notice the ServiceConfig setup which dictates how API GW talks to our FC instance.

Bash

aliyun cloudapi CreateApi \
  --GroupId "<YOUR_GROUP_ID>" \
  --ApiName "WSRegister" \
  --Visibility "PUBLIC" \
  --AuthType "ANONYMOUS" \
  --RequestConfig '{"RequestProtocol":"WEBSOCKET","RequestHttpMethod":"GET","RequestPath":"/"}' \
  --ServiceConfig '{"ServiceProtocol":"FunctionCompute","ContentTypeValue":"application/json","FunctionComputeConfig":{"FcRegionId":"cn-hangzhou","ServiceName":"TickerService","FunctionName":"ws-handler","RoleArn":"acs:ram::123456789:role/aliyunapigatewayaccessingfcrole"}}' \
  --WebSocketApiType "TWO_WAY"

Crucial Detail: The --WebSocketApiType "TWO_WAY" parameter is what allows our backend to asynchronously push messages back to the client using the connection ID.

Step 3.2: The FC 3.0 Stateless Handler (Node.js)

In Function Compute 3.0, we write a single monolithic handler that intercepts the routing from API Gateway based on system headers. API Gateway injects headers like x-ca-websocket-api-type (which tells us if it’s a REGISTER, UNREGISTER, or NORMAL message) and x-ca-websockets-document-id (the Connection ID).

JavaScript

// index.js (FC 3.0 Handler)
const Redis = require('ioredis');
const Core = require('@alicloud/pop-core');

// Initialize Tair (Redis) Client
const redis = new Redis({
  host: process.env.TAIR_HOST,
  port: process.env.TAIR_PORT,
  password: process.env.TAIR_PASSWORD,
});

// Initialize API Gateway SDK for Reverse Invocation
const rpcClient = new Core({
  accessKeyId: process.env.ALIBABA_CLOUD_ACCESS_KEY_ID,
  accessKeySecret: process.env.ALIBABA_CLOUD_ACCESS_KEY_SECRET,
  endpoint: `https://apigateway.${process.env.REGION}.aliyuncs.com`,
  apiVersion: '2016-07-14'
});

exports.handler = async (req, res) => {
  // Extract headers injected by API Gateway
  const eventType = req.headers['x-ca-websocket-api-type'];
  const connectionId = req.headers['x-ca-websockets-document-id'];
  const apiId = req.headers['x-ca-api-id'];

  try {
    switch (eventType) {
      case 'REGISTER':
        // Store connection in Tair (Set data structure)
        await redis.sadd('active_connections', connectionId);
        res.setStatusCode(200);
        res.send('Connected');
        break;

      case 'UNREGISTER':
        // Remove connection from Tair
        await redis.srem('active_connections', connectionId);
        res.setStatusCode(200);
        res.send('Disconnected');
        break;

      case 'NORMAL':
        // Handle incoming client messages (e.g., subscribing to specific crypto pairs)
        const body = req.body.toString();
        const parsed = JSON.parse(body);
        if(parsed.action === "subscribe") {
            // Add connectionId to a specific topic set, e.g., 'topic:BTC-USD'
            await redis.sadd(`topic:${parsed.pair}`, connectionId);
        }
        res.setStatusCode(200);
        res.send('Message Received');
        break;
        
      default:
        res.setStatusCode(400);
        res.send('Unknown Event');
    }
  } catch (error) {
    console.error("Error processing WS event:", error);
    res.setStatusCode(500);
    res.send('Internal Server Error');
  }
};

Step 3.3: The Fan-Out Execution

When the crypto pricing engine wants to update clients, it triggers a background function. This function pulls the connection IDs and uses the SendSystemMessage API to push data.

JavaScript

// fanout.js (FC 3.0 Background Task)
async function broadcastPriceUpdate(pair, priceData) {
  // 1. Fetch all connection IDs subscribed to this pair
  const connectionIds = await redis.smembers(`topic:${pair}`);
  
  // 2. Push to clients concurrently
  const promises = connectionIds.map(connId => {
    return rpcClient.request('SendSystemMessage', {
      DeviceId: connId,
      MessageBody: Buffer.from(JSON.stringify(priceData)).toString('base64'),
    }, { method: 'POST' }).catch(err => {
      // Handle stale connections (e.g., API GW dropped it but Tair hasn't synced)
      console.warn(`Failed to send to ${connId}:`, err.message);
      return redis.srem(`topic:${pair}`, connId); // Cleanup
    });
  });

  await Promise.all(promises);
}

4. The ‘MVP’ Failure Mode: Breaking at 100 Million Users

The architecture above represents a solid Minimum Viable Product (MVP). It works beautifully for 10,000, or even 100,000 concurrent users. However, at the scale of a tier-one crypto exchange—say, 10 million concurrent users—this exact implementation will catastrophically fail. As an architect, you must anticipate these failure modes.

Failure Mode 1: Tair Memory and Bandwidth Exhaustion

In our MVP, we use a single Redis SET to store topic:BTC-USD. If 10 million users subscribe to Bitcoin, reading 10 million IDs (SMEMBERS) blocks the Redis main thread. Furthermore, a single Redis node has a NIC bandwidth limit (typically 10Gbps to 25Gbps). Pushing a 1KB JSON payload to 10 million users requires processing 10GB of data. Attempting to pull 10 million IDs and push payloads through a single Tair instance will saturate the network interface, causing timeouts and massive latency spikes.

Failure Mode 2: The O(N) Loop and API Gateway Throttling

Our fanout.js uses Promise.all to iterate over connection IDs. Looping through 10 million elements in a single Node.js process will exhaust the V8 heap memory (OOM crash). Even if it survives, making 10 million sequential or batched HTTP requests to the API Gateway’s SendSystemMessage endpoint will hit account-level API rate limits, resulting in HTTP 429 Too Many Requests.

The Solution: Geographic Sharding and Tiered Fan-Out

To scale to millions, we must shard the data layer and parallelize the compute layer using a Tiered Fan-Out Strategy.

1. Geographically Shard Tair Pub/Sub

Instead of a single global Tair instance, deploy Tair instances geographically (e.g., US-East, EU-Central, AP-Southeast).

When a user connects, the API Gateway routes their REGISTER event to the local FC instance, which saves the connectionId to the regional Tair shard.

2. Tiered Topic Architecture

Instead of one massive array of IDs, break connections into manageable “buckets” (e.g., 10,000 users per bucket).

topic:BTC-USD:ap-southeast:bucket-1

topic:BTC-USD:ap-southeast:bucket-2

3. The Multi-Stage Fan-Out Pipeline

When a price updates, we use a Master-Worker compute pattern:

The Master FC: The crypto engine publishes to the Master FC. The Master FC queries the bucket registry and publishes sub-tasks to Message Service (MNS) or EventBridge. It sends a message saying: “Update BTC-USD for Bucket-1”, “Update BTC-USD for Bucket-2”.
The Worker FCs: EventBridge triggers hundreds of Worker FC instances simultaneously. Each worker is responsible for exactly one bucket.
Parallel Execution: A worker fetches its 10,000 IDs from its local Tair shard and pushes to the local API Gateway.

By distributing the load across hundreds of stateless FC containers and multiple Tair shards, no single NIC is saturated, no Node.js process is overwhelmed, and API Gateway traffic is distributed safely below throttling thresholds.

5. Conclusion

Building a real-time, high-concurrency system no longer requires managing armies of idle servers holding empty TCP pipes. By utilizing Alibaba Cloud API Gateway as a massive, managed edge proxy, we can successfully decouple the persistent state of a WebSocket from the business logic required to fulfill it.

Function Compute 3.0 provides the raw, instantaneous horsepower to process connection lifecycles statelessly, while Tair acts as the ultra-low latency connective tissue holding the network topography together.

True serverless architecture isn’t just about saving money on idle compute; it is about engineering systems where scaling up involves changing configurations and distributing data flows, rather than desperately provisioning virtual machines while your application burns. By applying geographic sharding and tiered fan-out patterns, this architecture scales linearly, allowing your crypto-ticker to serve its first user and its hundred-millionth user with the exact same millisecond latency.