Redis Pub/Sub under the hood
Do you want to code a chat app? Here you’ll see how to do it the hard way. I show how Redis Pub/Sub works in detail, all the way down to the bits and bytes! This is the first part of a series of deep dives into Redis.
At Pusher, instead of treating our system as a stack of black boxes, we like to get our hands dirty and poke around. Today we’ll roll up our sleeves and dismantle the drivetrain of Pusher’s pub/sub system: Redis. Redis is best known as a key-value server. Clients open a TCP connection to a redis-server process, and start sending commands to it which modify the database:
But Redis is also a messaging server! A client interested in donuts can open a TCP connection to the Redis server, send “SUBSCRIBE donuts”, then wait for donut-related news. A news outlet can then connect to the Redis server, send “PUBLISH donuts 3-for-$1”, and the subscribing client will be notified of this lucrative offer:
Now let’s zoom in. We can imagine the Redis process keeping track of each socket’s subscription set:
But what does the inside of Redis really look like? Read on to find out!
View source!
Today we’ll show how Redis Pub/Sub is implemented by following the
source, which is clean ANSI C. To follow
along, clone the repository and run make
(yes, it is that simple to build). In this post, I’ll be using the latest
release, 3.2.6.
Pub/Sub was introduced in early
2010
in this little
commit,
back when Redis was just a single C file, redis.c
. It’s a few hundred lines of
code and the implementation has not changed much since. The original Pub/Sub
implementation lets clients send three new kinds of command: PUBLISH
,
SUBSCRIBE
, and UNSUBSCRIBE
. To track subscriptions, Redis uses a global
variable
pubsub_channels
which maps channel names to sets of subscribed
client
objects. A client object represents a TCP-connected client by tracking that
connection’s file
descriptor.
When a client sends a SUBSCRIBE
command, its client object gets
added to the set of clients for that channel
name.
To PUBLISH
, Redis looks up the subscribers in the pubsub_channels
map,
and for each client, it schedules a job to send the published
message to
the client’s socket.
Handling disconnections
Client connections can drop. Perhaps the client closed the connection, or a
network cable was pulled. When this happens, Redis must clean up the client’s
subscriptions.
Let’s say Client A disconnects. To remove the client from the pubsub_channels
structure, Redis would have to visit every channel (“donuts” and “bagels”) and
remove the client from each channel’s subscription set.
But visiting every channel is inefficient: Redis should only need to visit the
“donuts” channel, because that is the only one that Client A is subscribed to.
To enable this, Redis annotates each client with its set of subscribed
channels, and
keeps this in sync with the main pubsub_channels
structure. With this, instead
of iterating over every channel, Redis only needs to visit the channels which
it knows the client was subscribed
to. Let’s
draw these sets as red circles:
Getting concrete
I’ve described the data structures as “maps” and “sets”: the global
pubsub_channels
variable is logically a Map<ChannelName, Set<Client>>
,
and each client’s subscription set is a Set<ChannelName>
. But these are
abstract data structures; they do not say how we represent them in memory.
Let’s start zooming in to allocated memory blocks.
The pubsub_channels
map is actually a hash
table. The channel name
is hashed to a position in a 2^n
-sized array, like this:
The pubsub_channels
array, with buckets from 0
to 7
, is a single
allocated block of memory. To publish to a channel, we hash the
channel’s name to find its bucket, then iterate over that channel’s set
of clients. But different channel names can hash to the same bucket.
Redis handles these collisions by “hash chaining”, which means each
bucket points to a linked
list of
channels.
In the example, both channels hashed to bucket 2
. But anything could happen,
because Redis picks a random seed for its hash function at
start-up, to
protect you against collision
attacks, in which a malicious
user could subscribe to a large number of channels which all hash to the same
bucket, causing poor performance.
The keys in the channel hash table are strings, colored green, and the values are sets of clients, colored red. But “set” is also an abstract data structure; how is it implemented in Redis? Well, the set of clients is another linked list!
It’s nice to think of the strings “donuts” and “bagels” as embedded in the hash chain objects. But this is not true: each string has a separate allocation. Redis uses strings extensively, and has its own representation for them: “Simple Dynamic Strings”. This is a character array prefixed by its length and the number of free bytes. We can draw it like this:
We are almost at the level of memory blocks, except for one thing: each client’s set of channels. Redis chooses to not use a linked list here; instead, Redis uses another hash table. The channel names are the keys of the table:
Why does Redis use a linked list to represent the channel’s client set, but a hash table to represent the client’s channel set? We’re not sure. We suspect the channel’s client set is a linked list because it’s optimized for publishing, where it iterates over the set. The client’s channel set is a hash table because it’s optimized for subscribe/unsubscribe, where it does a lookup in the set. Let us know if you have insights on this.
Notice also that the value pointers in each client’s hash chain (yellow) are ignored; they are unused memory. Only the keys are used when using a hash table to represent a set. The memory waste is okay compared to the code reuse we gain.
Finally, we’re pretty close to the truth: each block in the diagram
represents a memory allocation in the redis-server process. Let’s recap
our PUBLISH
and SUBSCRIBE
algorithms:
-
To
PUBLISH
, hash the channel name to get a hash chain (yellow). Iterate over the hash chain, comparing each channel name (green) to our target channel name. Once we’ve found our target channel name, get the corresponding list of clients (red). Iterate over the linked list of clients, sending the published message to each client (purple). -
To
SUBSCRIBE
, find the linked list of clients as before. Append the new client to the end of the linked list. (Actually, this is a constant-time operation, because the linked lists have a tail pointer.) Also, add the channel to the client’s hash table.
Realtime hash tables!
Notice that the hash tables are different sizes, roughly proportional to how many elements they have. Redis resizes hash tables in response to their number of elements. But Redis is built for low latency, and resizing a hash table is a time-consuming operation. How can it resize the hash table without causing latency spikes?
Answer: Redis gradually resizes the hash table. It keeps two underlying
hash tables, the
old and the new. Consider this pubsub_channels
hash table in the middle of a
resize:
Whenever Redis performs an operation on the hash table (lookup, insert, delete …), it does a little bit of resizing work. It keeps track of how many old buckets have been moved to the new table, and on each operation, it moves a few more buckets over. This bounds the amount of work, so that Redis remains responsive.
Expensive unsubscribes
There’s one more important command in Redis Pub/Sub: UNSUBSCRIBE
.
UNSUBSCRIBE
does the inverse of SUBSCRIBE
: the client will no longer receive
messages published to that channel. How would you write UNSUBSCRIBE
, using the
data structures above? Here’s how Redis does it:
- To
UNSUBSCRIBE
, find the linked list of clients for the channel, as before. Then iterate over the entire list until you find the client to remove.
The UNSUBSCRIBE
operation is therefore O(n), where n is the number of
subscribed clients. With a very large
number of clients subscribed to a Redis channel, an UNSUBSCRIBE
can be
expensive. This means you should either limit your clients or the number of
subscriptions that they are allowed. One of Pusher’s important optimizations is
de-duplicating subscriptions: millions of Pusher subscriptions are collapsed
into a much smaller number of Redis subscriptions.
Redis could optimize this by using a hash table instead of a linked list to
represent the set of subscribed clients. However, this might not be desirable,
because publishes will be a little slower: iterating over a hash table is slower
than iterating over a linked list. Redis optimizes for PUBLISH
operations,
since they are more common than subscription changes.
Pattern subscriptions
The original Redis Pub/Sub API provides PUBLISH
, SUBSCRIBE
, and
UNSUBSCRIBE
. Shortly afterwards, in this
commit,
Redis introduced “pattern
subscriptions”.
Pattern subscriptions let a client subscribe to all channels matching a
Regex-like pattern, instead of only subscribing to a single literal channel
name.
The important new command is PSUBSCRIBE
. Now, if a client sends
PSUBSCRIBE food.donuts.*
, and a news outlet sends PUBLISH
food.donuts.glazed 2-for-£2
, the subscribed client will be notified,
because food.donuts.glazed
matches the pattern food.donuts.*
.
The pattern subscription system is completely separate to the normal
channel subscription system. Alongside the global pubsub_channels
hash
table, there is the global pubsub_patterns
list.
This is a linked list of
pubsubPattern objects,
each of which associates one pattern with one client. Similarly,
each client object has a linked
list of
the patterns it is subscribed to. Here’s what redis-server
memory looks
like after client B subscribes to drink?
, and clients A and B
subscribe to food.*
:
There is a global linked list down the left-hand side, each pointing to
a pubsubPattern
(deep red). Each pattern is represented as its literal
string in memory. On the right-hand side, each client has its own linked
list of patterns.
Now, when a client sends PUBLISH food.donuts 5-for-$1
, Redis will iterate
through the global pubsub_patterns
list, and
test the string food.donuts
against each
pattern.
For each successful match, Redis will send the message 5-for-$1
to the linked
client.
This system may surprise you: multiple clients subscribed to the same pattern do
not get grouped together! If 10,000 clients subscribe to food.*
, you will get
a linked list of 10,000 patterns, each of which is tested on every publish! This
design assumes that the set of pattern subscriptions will be small and distinct.
Another cause for surprise is that patterns are stored in their surface
syntax. They are not compiled (e.g. to
DFAs).
This is especially interesting since Redis’s matching function
stringmatch
has
some … interesting worst-cases. Here is how Redis tests the pattern
*a*a*b
against the string aa
:
stringmatch("*a*a*b", "aa")
stringmatch("a*a*b", "aa")
stringmatch("*a*b", "a")
stringmatch("a*b", "a")
stringmatch("*b", "")
stringmatch("b", "")
false
stringmatch("a*b", "")
false
stringmatch("a*a*b", "a")
stringmatch("*a*b", "")
stringmatch("a*b", "")
false
stringmatch("a*a*b", "")
false
This malicious pattern with many “globs” causes an exponential blowup in the running time of the match! Redis’s pattern language could be compiled to a DFA, which would run in linear time. But it is not.
In short, you should not expose Redis’s pattern subscriptions to untrusted clients, because there are at least two attack vectors: multiple pattern subscribes, and crafted patterns. At Pusher, we tread very carefully with Redis pattern subscriptions.
Conclusion
Redis Pub/Sub is an efficient way to distribute messages. But you should know what it is optimized for, and where the pitfalls are. To truly understand this, study the source! In short: only use Redis in a trusted environment, limit the number of clients, and handle pattern subscriptions with gloves.
In this post, we only looked at Redis as one single-threaded process. But Pusher handles billions of messages per day: too many for a single process. In our next post, we’ll see how Pusher scales Redis Pub/Sub. Stay tuned!