Here at Pusher, our new products, Beams (our push notification API) and Chatkit (our chat API), have recently undergone extensive benchmarking of their systems as part of going from Beta to GA. And both make use of PostgreSQL.

There are many good tutorials out there that talk about the Golang’s native SQL abstraction - how to run queries along with other intricacies about the package. Most of them, however, seem to skip rather production-relevant settings that have to do with connection pooling.

Before we dig in into what those are, a little more context.

A connection pool is a set of maintained connections that can be reused for future requests to the database. When you instruct sql.DB to run a query, it will first check if there are any idle connections - if there’s one in the pool, it will take it and return it back to the pool at the end of the query. If there’s no idle connection available in the pool, a new connection will be open. The benefits of reusing a connection are that you don’t need to incur the cost of establishing a new connection with every single query.

Setting #1: SetMaxOpenConns

sets the maximum number of open connections to the database

By default this is unlimited! That is problematic if there’s a bug and it accidentally gets into a hot-loop in one of your workers: your entire system is now at risk of crashing as the database will fail at a certain point (usually dictated by the max_connections Postgres configuration value). If this value is set, new queries will be blocked until a connection becomes available.

You’ll always want to set this value in production and make it a fraction of what the database can handle (also depends on how many workers you’re expecting). In AWS, the maximum number of connections is based on PostgreSQL memory. You can check online for the limits for your database instance type.

Whenever you have a connection pool (or a buffered channel for that matter), you should always track its current usage, so you can correlate potential issues with the exhaustion of the connection pool.

This can done like so:

metric.TrackFuncFloat("db.connection_pool_usage", func() float64 {
  numOpenConnections := float64(underlyingDB.Stats().OpenConnections)
  return numOpenConnections / float64(maxOpenConnections)
})

Which results in this graph:

Connection Pool Usage Maxes Graph

Setting #2: SetMaxIdleConns

sets the maximum number of connections in the idle connection pool

By default, the max idle connections is… you guessed it! Two. Yes, 2.

If the number of queries per second is high, then you’re likely to be in a situation where connections are being created and disposed of immediately due to this low default value.

Allowing a higher number of idle connections in the pool will improve performance because it makes it less likely that a new connection will need to be established from scratch — saving many, many milliseconds.

There’s an opposing force at play here: the more connections you have the more resources both your worker and database will require. In our observations, we saw roughly 1MB memory increase in the worker and 2.5MB memory increase in the database per connection. Additionally, there will be a small CPU cost in maintaining those connections alive.

You’ll want to set this value to be a fraction of the MaxConnections. Whether it’s 25%, 50% or 75% (or sometimes even 100%) will depend on your expected load patterns and also on the value for the next setting.

If we allow for more idle connections, that means they are more likely absorb a sudden increase in requests. Go SQL library does not offer statistics on this number. However, because the driver is an interface, we can create a driver that decorates another one while tracking the number of new connections. Here’s a quick example on how to achieve this:

func (ncm *sqlConnectionsMetric) Open(name string) (driver.Conn, error) {
  conn, err := ncm.underlyingDriver.Open(name)

  if err != nil {
    ncm.metric.Increment("db.new_connection.error")
  } else {
    ncm.metric.Increment("db.new_connection.ok")
  }

  return conn, err
}

And here’s this metric in production.

New Connection Rate

The SetMaxIdleConns was increased to 50% of the maximum connections around 13:30. As you can see, there are now much fewer connections being established and there are some periods of time (the gaps) where it doesn’t even need new ones.

Setting #3: SetConnMaxLifetime

sets the maximum amount of time a connection may be reused

By default, there’s no limit on the connection age, but you’ll want to set this if you’re also setting the max idle connections. As a simple heuristic, the higher max idle connections percentage, the lower connection max lifetime you should set.

Another useful situation to set this could be if you want to facilitate gracefully swapping databases behind a load balancer.

For Beams, we set this value to 30 minutes. To our surprise, it doesn’t seem to kill off connections after a connection spike, rather we see the connections being halved every 30 minutes:

Connection Pool Usage Maxes With Spike

And this was the API requests spike that caused the connection pool to reach 100%:

API Requests With Spike

One thing to note here is that because the connection pool had a small number of idle connections, it had to create a lot of them to accommodate the sudden increase in requests. Creating a new connection to perform a query takes a significant proportion of time with respect to the query itself. This means that if there were many connections idle, the spike wouldn’t have been so profound as it was in this case.

Benchmark results

Here’s some benchmark results, testing the Beams publish endpoint at 150 requests per second.

With maximum idle connections default of 2:

Benchmark Default Max Idle Conns

Benchmark Default Max Idle Conns Metrics

And here’s the same benchmark, but now with maximum idle connections 50%:

Benchmark Default Max Idle Conns

Benchmark Default Max Idle Conns Metrics

From these benchmark results, you can see that the 99th percentile is two times smaller when the connection pool is allowed to grow up to 50% of the limit. The database latencies are also much more consistent and less spiky.

Some final thoughts

  • Making use of these settings is important for a production-ready service. They will impact service stability and performance.
  • Go doesn’t seem to have good defaults for production environments, so you should default to checking the defaults.
  • If you have different types of load within a single service, instead of having a single connection pool, consider having separate ones. That makes the idle connections a bit less efficient but it will prevent the 100% utilization from blocking other queries.
  • If you have a bursty database use, consider batching requests. A single more expensive operation is better than one hundred tiny ones.