At the Pusher office, we have a little counter, constantly incrementing. It shows the total number of messages which have ever been delivered via Pusher Channels. On Friday night, at 22:20 UTC, that counter gained another digit, reaching 10,000,000,000,000 messages. There are 13 zeroes in that number, or ten trillion.

Screenshot of the message counter

You might think that this total messages counter is a “vanity metric”. But this number is a key indicator of success for Pusher Channels, our realtime communication offering. First, this counter is measuring the trust given to us by our users. Second, this counter is measuring the scalability of our system. To increase the counter, we at Pusher must ensure users trust us to deliver more messages, and we must ensure that our system is capable of delivering those messages. What does it take to deliver ten trillion messages? Let’s have a look.

In a typical second, Pusher Channels sends around 200,000 messages. It has reached peaks of millions per second, for example when the New York Times uses Channels for realtime election night coverage. We can start by seeing Pusher Channels as a big black box which these messages pass through:

Pusher Channels is a “publish-subscribe” system. Clients subscribe to channels like btc-usd or private-user-jim, and then other clients publish messages to those channels. If a million clients are subscribed to the channel btc-usd, and someone publishes the latest Bitcoin price to btc-usd, then Pusher Channels must deliver a million new messages. It does so in a few milliseconds.

A single server cannot deliver all one million messages in such a short time span. Channels employs three time-honored techniques to deliver these messages at low latency: fan-out, sharding, and load balancing. Let’s look inside the box!

The millions of subscribers are shared by around 170 large edge servers, each of which holds around 20,000 connections to subscribers. Each edge server remembers the channels that its clients are interested in, and subscribes to these channels in a central service, Redis. Even if an edge server has 2,000 clients interested in btc-usd, the edge server only needs to subscribe once to btc-usd in Redis. Thus, when a single message is published to btc-usd, Redis sends 170 messages to the edge servers, then each edge server sends a further 20,000 messages to subscribers. This technique is called fan-out.

With fan-out alone, there is still a central Redis component which all publishes go through. Such centralization would limit the number of publishes per second. To get past this limit, the central Redis service is made up of many Redis shards. Each channel (such as btc-usd) is assigned to a Redis shard, by using a hash of the channel name. When a client publishes a message, this goes to the rest service. The rest service hashes the channel name to discover the right Redis shard to pass the message to. This technique is called sharding.

It sounds like this just pushes the centralization to the rest service. It does not, because the rest service is composed of around 90 of rest servers, each of which does the same job: accept a publish, compute the Redis shard, and send it on. When a publisher publishes a message, this goes to one of the many rest servers. This technique is called load balancing.

Together, sharding, fan-out and load balancing ensure that the Channels system has no single central component. This property is key to the horizontal scalability which enables it to send millions of messages per second.

This is the core of the Channels service, but there are many other parts, such as metrics (how we got the number ten trillion!), webhooks (telling customers about interesting events), authorization (restricting access to channels), presence data (live user lists), and rate limiting (ensuring customers get what they pay for and don’t affect other tenants in the system). All of these need to be achieved without affecting the throughput, latency or uptime of the publish-subscribe service.

Interested in solving these challenges, and helping us add another digit to our “total messages” counter? We’re hiring! See